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Preface ° 

Some adventures, such as scaling the highest mountain or driving the fastest aircraft, 
go down in history as great events. However, others go unnoticed, deemed unworthy 
of a single line in the newspapers, in spite of being much more significant. One of 
the conquests is the taming of chance. It is possible to predict what will most likely 
happen using numbers alone. This is a formidable and unfinished discovery, but 
one that has a significant impact on everyday life. In this book we invite the reader 
to follow the path travelled by humanity throughout history as it seeks to manage 
chance and embrace the unpredictable. 

To make the leap from believing that only gods know the future and that little 
more than magic and rituals can be used to make contact with that which lies beyond 
to quantifying the probabilities with which events occur has required formidable 
effort, With reasonable accuracy, it is possible to predict the result of elections before 
they take place, or the probability of suffering an illness before we are examined, or 
how long a low-energy light bulb will last. 

Furthermore, in the context of human history, this ability has been won only 
recently. Even if a large part of the mathematics we use is ancient — some, such as the 
Euclidean geometry we learn in school, is more than 20 centuries old — the important 
aspects of probability that we use today are scarcely a century old, We shall be present 
at the first steps in the understanding of uncertainty and will come to see, hand in 
hand with gamblers — a pastime reviled throughout the ages — that not everything is 
equally probable. Immensely talented figures such as Pascal and Fermat will appear at 
the beginning of the story. We shall see how the analysis of errors committed when 
carrying out repeated measurements (inherent in the process) makes it possible to 
discover the law that governs the distribution of many other variables in both technical 
and social phenomena. The law is frequently referred to as the ‘normal distribution’ 
and is represented by a beautiful line — the Gauss bell curve, one of mathematics’ 
most iconic images. 

We shall come up against games, quantifying the difficulty of guessing their 
outcomes (requiring us to be able to count correctly) and calculate the average 
amount we shall lose on the lottery. This leads us to the ‘mathematical expectancy’, 
which forms the basis of the calculations of everyday values such as insurance 
premiums, another element in the adventure of conquering chance. Along the way 
we will discover many surprises in related situations with unexpected probabilities. 
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PREFACE 


This book is a call to study and reflect on chance. In contrast to the enlightenment 
ideal, we inhabit an island of certainty in an ocean of uncertainty. To understand the 
world, we must educate ourselves in the realm of chance, one of the last territories 
remaining to conquer and one which produces so much unease in our society 
of certainties. We believe the path we propose, full of challenges, discoveries and 
surprises, represents a good start. 
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Chapter 1 


The Art of Counting 
Correctly 


First steps 


In this first chapter we shall review the basic concepts of a unique and extremely 
important skill: the art of counting. The calculation of all possibilities that may arise 
has a wide range of applications. For how many years can certain car registration 
systems be used? How many possible combinations of numbers occur in lotteries? 
How many different ways are there of combining my clothes? 

When it comes to giving answers to these questions and others of a similar nature, 
we can always make use of the tried and tested method of ‘counting manually’. 
However mathematics has long since developed combinatorics for the purpose 
of counting the number of objects or groups of objects in situations such as those 
described above, without having to enumerate them one by one. These problems 
have common features, making it possible to define mathematical models for studying 
them all, known as combinatoric models. Once these had been discovered, it was a 
simple case of applying the corresponding formula to solve a specific problem. Let’s 


now consider an example. 


Selections in the workplace 


We want to select two representatives, each with a specific role (let’s say a delegate and 
a secretary) for a complicated negotiation with management. If there are 25 people, 
all of whom have a vote and all of whom are candidates, how many different ways 
can the choice be made? First let us consider the delegate. There are 25 candidates, 
hence 25 ways of making the choice. For each delegate that is chosen, there are 24 
ways of choosing the secretary: 25 -24=600 ways of making the selection. 

What happens if we do not need to distinguish between the delegate and the 
secretary? How many ways are there for making the choice now? As there are no 


‘positions’ in the commission, if we count in the same way as we did above, each 
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pair that was selected will be counted twice: it makes no difference if Mary comes 
first, followed by John, or if they are the other way around. The number of different 


ways is now: 


These situations, and others ofa similar nature, in which it is necessary to calculate 
the number of possibilities, occur frequently in everyday life. In general, when it 
comes to counting them, we do not enumerate all the possibilities one by one, 
but make use of general procedures for calculating the total number. This is called 
combinatorics, the goal of which is to study the different groupings and orderings 
that can be made from a series of objects, regardless of their nature. Typical problems 
include the ways of selecting a sample from a set of objects, the ways of placing a 
number of objects into a certain number of boxes, or the ways of dividing a set into 
smaller parts. The models for these problems are described as variations, combinations 
and permutations. 

However, before beginning to count, it is important to ensure the information is 
organised correctly. We are going to use an important and highly versatile resource 
(which we will later use for probabilities): tree diagrams or simply, trees. 

Graphs are an extremely useful mathematical tool for symbolising relationships 
between objects. Graphs represent the objects by means of points (the vertices of the 
graph) and the relationships between them as lines (the edges of the graph). Tree 
diagrams constitute a simple type of graph in which each pair of vertices is connected 
by at most one edge. They start at a point P,, the origin of the tree or initial vertex, 
which gives rise to a series of edges that join it to other points (in our case P,, P, 
and P.). Generally speaking, these vertices are also connected to others, and so on. 
Each branch of the tree has a terminal edge that ends with a terminal vertex that has 
no more edges. In the trees shown in our example, the terminal vertices are P,, P,, 
P, and P,, as can be seen below: 


2d P, 

P. 

P. - 

F; 2 +p. 
P. 
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Let us consider an example. There are three pairs of trousers in my wardrobe: 
grey (G7), blue (BT) and red (RT); two shirts, blue (BS) and white (WS), and two 
jumpers, one blue (BJ) and the other yellow (YJ). Every day I choose one of each 
item of clothing. How many different outfits are there if I wish to avoid wearing 
any items of the same colour? 

Let us draw a tree with all the possibilities and establish an order for selection 
— first, trousers; then shirt; and finally jumper. In terms of trousers, we have three 
options: grey, blue and red; hence we will draw three edges from the origin of the 
tree and write the corresponding options at the ends: 


Gr 
<n 
RT 


Now let us take the final vertex of the first edge (G7).We have chosen the grey 
trousers. When it comes to the shirt, we now have two options (since there isn’t a 
grey shirt). Hence there are two edges: 


BS 
Gia 
< BT 
RT 


Let us follow the vertex of the second edge (BT). We have chosen the blue 
trousers. Since it is not possible to repeat colours, there is only one possible choice 
of shirt — white (WS). Hence we add one edge: 


BS 
Ch Ts 
BT — WS 
RT 


On the vertex of the third initial edge (RT, red trousers) there are two options 
for the shirt, since it is not possible to repeat the colour. Hence we add two edges 
to the tree: 


BS 
en aera 

BT -—+ WS 

BS 

RTS ae 
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The tree now reflects the possibilities for choosing trousers and a shirt, but is 
missing the jumper. In order to complete it, we must follow a similar procedure, 
analysing the new vertices we have drawn. Hence for the vertex BS of the branch 
GT-BS (grey trousers and blue shirt), the choice for the jumper is limited to a 
single option, the yellow one (YJ) — and so avoid repeating the colour. Hence, 
we add an edge to the tree (YJ), which will terminate this branch. The branch 
represents the option ‘grey trousers, blue shirt and yellow jumper’. Following the 
procedure for all the branches gives the complete tree: 


BS —- YJ GT-BS-YJ 
GT BJ] GT-WS-B 
Cwse yy ee 


BT— WS— YJ BT-WS-YJ 
BS — YJ RT-BS-Y] 
er BJ] RT-WS-BJ 
YJ RT-WS-Y] 


The full solution to the problem is the number of branches of the tree, in this 
case seven. Drawing this tree may seem more complicated than manually counting 
the possibilities, but the procedure can be applied to many situations with good 
results. The following figure shows the full schema: 


BS BS BS 
GT Gieeare Glee GT yc 
BT BT BT e— WS BT +—+ WS 
RT RT RT wes 


S — YJ RT-BS-Y] 
RT BJ] RT-WS-B] 
Ke < YJ RT-WS-YJ 


We shall now describe the basic principles of a counting process. 
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The basic counting principle or the multiplication principle 


This calculating strategy, or general principle of counting, establishes that given 
two experiments, one of which has m possible results and the other, n, there are a 
total of m « n possible results for both experiments. In terms of sets, if one set has m 
elements and the other, n, there are m:n ways of choosing pairs of elements with 
one from each set. 

In general terms, we can state this basic principle as follows. If k experiments are 
carried out, the first of which can give rise to n, results, the second to n,,and so on 
all the way to n,, we have a total of n,-n,"...°n, possible results. 


The pigeonhole principle or Dirichlet's principle 


This principle states that if we have three pigeons and only two pigeonholes, it is 
obvious there will be more than one pigeon in one of the pigeonholes. This simple 
deduction, which applies when the number of pigeons is greater than the number 
of pigeonholes, forms the basis of many problems of counting. 

Imagine we need to distribute m objects in n boxes. If m is divisible by n, we can 
put, for example, m/n objects in each box. 

However, obviously m will not always be divisible by n and nor will we always 
wish to put the same number of objects in each box. Based on the pigeonhole 
principle, we can state that: 


— If we have m objects distributed in n boxes, where m>n, one of the boxes 
will have at least two objects. 

— If we have m objects distributed across n boxes (where m is not a multiple of 
n), one of the boxes will have at least p+ 1 objects, where p is the quotient of 
the whole number division of m and n. 


For example, if Paul sent 29 urgent letters last week, we can be sure he sent at 
least five on one day. The argument is simple — we are trying to distribute 29 objects 
(letters) across seven boxes (days of the week). Regardless of the distribution, it 
will be necessary to put more than four objects in one of the boxes. If we put four 
objects in each box there will be one left over and hence it will be necessary to put 
five in one of the boxes. 
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THE SAME NUMBER OF HAIRS 


The pigeonhole principle allows us to be sure that in a city with one million 
inhabitants, there are at least two inhabitants with the same number of hairs, 
without, of course, needing to count the number of hairs on people's heads. 
Imagine the human head as a sphere. We measure its diameter, calculate 
the surface of the cranium (which is just over half the total surface) and 
observe the concentration of hairs by unit of surface (e.g. one hair per 
millimetre square). This allows us to deduce that the number of hairs 
is equal to the number of square millimetres on our head. Those who 
have done the calculation assure us that it would be impossible to 
have more than 200,000 hairs on our heads. However, to be on the 
safe side, we shall assume there can be as many as 250,000. Hence, 
there can be at most 250,000 different ‘pigeonholes’ (the number of 
hairs), and since there are one million inhabitants in the city, we can 
be sure many will have the same number of hairs. We can even 
guarantee that there are 750,000 people whose number of hairs 
coincides with at least one other inhabitant in the capital: since 
we can fill 250,000 different pigeonholes, the remaining 750,000 
must be placed in one that is already full. In fact, with even 
greater precision, we can say that someone who lives in a city 
with more than 250,000 inhabitants is guaranteed to have a 


‘partner’ with the same number of hairs. 


Combinatorial problems 


There are two types of situations of particular interest because they are so common: 


selecting samples and putting objects in boxes. 


Selecting samples 


This involves calculating how many different ways there are of selecting a sample, or 
set of elements, from a collection of objects. In general, we know the total number of 
objects and the number of objects in the sample (the sample size). We must consider 


various situations: 
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— Ifthe objects are the same or different. 

— Ifthe sample can contain repeated elements. 

— Ifit is necessary to consider the order in which the elements are selected, or 
rather whether this is an ordered sample. 


Returning to our first example of elections in the workplace, the problem involves 
finding how many different samples of size two can be taken from a collection of 
25 different elements, in which the elements cannot be repeated (two different 
representatives), and the order in which they are selected is taken into account, since 
their roles are different. 


Putting objects in boxes 


In this case, we need to count the number of ways of putting a certain number of 
objects into a given number of boxes. A number of different situations may arise: 


— The objects are the same or different. 

— The boxes are the same or different. 

— Itis possible to put more than one object in each box. 

— Boxes can be left empty. 

— The order in which the objects are placed in the boxes should be taken 
into account. 


In a race with eight athletes, how many different ways are there of sharing out 
the three medals? We must determine the number of ways of putting three different 
objects (the medals) in eight different boxes (the athletes), putting only one object in 
each box (each athlete can only receive one medal). There are 8 boxes available for 
the first object. Having determined the box for the first object, there are 7 remaining 
for the second. Having determined the boxes for the first two objects, there are 6 
remaining for the third. The total number of possibilities is 8-7-6 = 336: the three 
medals can be shared out in 336 different ways. 

Combinatorial mathematical models allow us to solve problems of this nature. 
We can also see how the same model can be used for the two different types of 
problems, allowing us to establish analogies between different types. However, this 
does not allow us to solve all the situations that can arise in problems of this nature; 
in such cases we must recur to other techniques. 
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Permutations and factorials 


To permute is to reorder a series of objects. Permutations are the different ways of 
ordering the n elements of a set. The number of permutations of n objects, P,, is 
the number of different ways in which they can be ordered. This number is easily 
found. We can choose the first element from among the n that are available; there 
are n—1 possibilities for the second; for the third, there will be n—2, and so on. The 
number of permutations of n different elements is: 


P.=n-(n—1)-(n—2)-...-3-2-1. 


The product that gives us the number of permutations of n elements is referred 
to as the factorial of n or n factorial, and is written as n!: 


n!=n-(n—1)-(n—2)-....3-2-1=P 


VERTIGINOUS GROWTH 


As the value of n increases, the value of n! grows much more quickly than is often expected. 
We can see this by carrying out some calculations on a calculator. For example, 5!=120, and 


10! =3,628,800 are manageable numbers, however 20! =2,432,902,008,176,640,000=2.4 
~10"8, has 19 digits, and 50! =3.04- 10, no fewer than 65 digits. To facilitate our calculation, 
in spite of the fact it has no meaning, by definition, we define 0! =1. 


If we need to enumerate all the permutations, we can use the same strategy we 
used for calculating their number: take the first element and analyse the different 
possibilities we have for the second. For each of these, consider all the possibilities 
for choosing the third element, and so on. The possible permutations of 1, 2,3 and 
4 (i.e. all the different four-digit numbers that can be made with them) are as shown 


in the table: 
2134 3124 4123 
2143 3142 4132 
2314 | 3214 4213 
2341 
2413 
2431 
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MENUS AND GUESTS 


The ways of putting n different objects into n different boxes such that there is a total of 
one object in each of the boxes is also given by the permutations of n elements. Given seven 
different menus, how many ways can these be distributed over the seven days of the week 
to give weekly menus? The result is given by the permutations of seven elements: P,=7! = 
7-6:5:4-3-2-1=5,040; no fewer than five thousand! 

Although if we think of different situations, that’s a lot of menus! We are all familiar with 
the painting the Last Supper by Leonardo da Vinci, which represents Jesus and his 12 dis- 
ciples. If Leonardo had decided to make copies of the painting, changing the position of 
the guests, he would have had to paint 13! =6,227,020,800 paintings, more than 6,000 
million! If he had decided to always place Jesus in the centre, he would have saved some 
work, but he would still have had to paint a considerable number, 12! = 479,001,600, or 
rather more than 479 million paintings. Even if Leonardo enlisted many of his students to 
help him with the paintings, the exercise would seem if not impossible (because we are 


dealing with a finite number) certainly extremely difficult. 


The Last Supper, by Leonardo da Vinci, painted between 1495 and 1497 on a wall 
of the refectory in the convent of Santa Maria delle Grazie in Milan. 


When the elements of the permutation must be ordered (letters of the alphabet, 


numbers, words, etc.), we use the term permutation run when its elements are arranged 


in the natural order. The situation becomes more complicated when some of the 


elements in the permutation are the same. 
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Suppose we want to know how many different ways there are of ordering the 
letters of the word AMAPOLA, which has seven letters. If they were all different, 
the number would be: P,=7!=7-6-5-4-3-2-1=5,040. However, when we switch 
two consecutive letters that are the same, we get the same permutation, meaning 
there are not as many. If we keep the letters M, P, O and L fixed, there are 6 different 
orderings with the 3 As, since P,=3-2-1=6. This is the case regardless of the 
location of the letters M, P, O and L in the word, meaning that the total number of 
permutations will be: 


Under these circumstances; we talk of permutations with repetition. In general, if 
we have a collection of n objects, in which there are a copies of object A, b copies 
of B, ..., 2 copies of Z (a+ b + ...+z=n), the total number of permutations with 
repetitions is: 

n! 
al:ble zt 


Imagine a chess player who wishes to place two white pawns and four black ones 
in a row. How many ways can this be done? The answer is given by the permutations 
with repetition of six objects, one of which is repeated twice and the other four times. 


ja = = 15. 


24! 2-24 


Variations 


Two of the examples above (selecting a delegate and a secretary, and distributing 
three medals among eight athletes) have common features: 


— In both cases, it is necessary to select a sample (2 from 25, and 3 from 8). 

— It is necessary to take into account the order of the selection (the delegates 
must be selected first, and likewise for the medals). 

— The elements cannot be repeated (two positions or prizes cannot be given 


to the same person). 
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In both cases, we are dealing with variations. Given a set of m different elements, 
we use the term variations of order or variations of m elements choose n to refer 
to all groupings of n of the different m elements without repetitions, in which two 
variations are different if they differ in one of their elements, or their order. This is 
written as V,,. 

Consider the set 1, 2, 3, ..., m. Let’s list the variations of order n of these m ele- 
ments and calculate their number, It is clear that if we consider each of the elements 
in isolation, we have a variation of order 1. 


Dap 2 es Sess ami 
Hence the number of such variations is: V,, = m. 


Adding a different element of the set to the right of each of the variations of 
order 1 gives the variations of order 2: 


12 13 14 rh Im 
22 23 24 = 2m 
mi m2 m3 re m(m—1). 


Hence, the number of variations of order 2 is the result of multiplying the 
number of variations of order 1 by the number of elements that can be added to 
each of them, which is m— 1. Hence, Vi2= m-(m—1). 

Adding a different element to the right of each variation of order 2 gives the 
variations of order 3. The number of such variations is the result of multiplying V,, , 
by the number of elements that can be added to each of them, which is m —2. Hence, 
Vis = Voua (m—2) =m: (m—1)-(m—-2). 


Continuing with this process gives: 


V,4= mm (m—1)- (m—2) - (m—3) 
V_,=m-(m—1)-(m—2) -(m—3)-(m—4) 


m, 


In general: 


Vo =m: (m—1):(m—2)-...-(m—n+1). 
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ANOTHER FORMULA 
According to the definition of a factorial, 
ml=m-(m-1)-(m-2):...<(m-n+1)(m=n)-(m-n=1)-... -3-2-1=V,,,-(m-n)!. 


Hence V,, , can also be calculated using the following formula: 
mi 
V.=7 
n=l 
The permutations of n are the variations that can be formed by choosing n of the n objects, 
or rather choosing them all. Hence P,=V,,, (remember 0! = 1). 


The previous situation can also be considered in terms of putting objects into 
boxes. Let us consider the two previous problems once again: 


— The first case corresponds to distributing 2 different objects (the positions) 
in 25 different boxes (the candidates), and the second corresponds to sharing 
out 3 different objects (the medals) in 8 different boxes (the athletes). 

— It is not possible to put more than one object in each box. 

— The order of the objects must be taken into account. 


Generalising these observations, we can say that the ‘different ways of putting m 
different objects into n different boxes, such that each box does not contain more 
than one object, are the variations of m elements choose n’. 

Considering the possibility of repeating the elements to make up different 
variations gives the number of variations with repetition. Given a set of m elements, the 
variations with repetition of order n are all the groupings of n of the m elements that 
allow repeated elements and in which two variations are regarded as being different 
if one of their elements varies or they are in a different order if all the elements are 
the same. This is written as VR, 

Following a reasoning similar to that used to find V,_, with the only difference 
being that now, from the variations after those of order 1 it is possible to choose 
from all of the elements (since they can be repeated), leads us easily to the conclusion 
that VR, =m". 

How many different four-digit numbers are there? Since there are 10 different 
digits (0, 1,2, ...,8, 9), the number is given by the variations with repetition of order 
4 of the 10 numbers: VR, = 10*= 10,000 numbers (where numbers such as 0325, 


10,4 
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0076 and 0005 are understood to be four digit numbers). For how many of these 
numbers are all four-digits different? Now we are talking about variations without 
repetition: V,, ,=10-9°8-7=5,040. Each of the other four-digit numbers (4,960) 
has a repeated digit. 


BEING CERTAIN 
If we wanted to be sure of winning a football pool, we would need to fill in coupons with all 


the possible results. Remember that the number of possible results per game is 3, and that 
there are 14 games in this pool. How many coupons will we need to fill out? The answer is 


given by the variations with repetition of 3 elements choose 14, hence VR, ,,=3'*=4,782,969 
bets. If we consider the full 15 (the option in which it is necessary to correctly guess the result 
of 15 games), there are 3 more, hence, VR, ,,=3'>= 14,348,907 bets. It is clear that this is not 
a good business proposition. We would have to spend more money than we could possibly 
win in prizes. 


Combinations 


Let us now turn our attention to two problems: choosing 4 students from a class 
of 22 to play the flute, and preparing cups with 2 different-flavoured scoops of ice 
cream when there are 5 different flavours. Both problems share certain properties: 


— We need to select a sample (4 students from 22, and 2 flavours from 5). 

— The order in which they are selected is unimportant (the 4 students are going 
to do the same thing, and the flavours strawberry—vanilla, vanilla—strawberry 
are the same). 

— Elements cannot be repeated (a student cannot play more than one flute, and 
flavours cannot be repeated in the cup). 


In both cases, we are dealing with combinations. Given a set of m different elements, 
the combinations of order n or the combinations of m elements choose n are all the groupings 
of n of the m elements in which there are no repeated elements,‘where two different 
combinations are different only if they differ by one of their elements’. This is 
written as C,, .. 
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Let us consider the set 1,2, 3, ..., m. The idea is now to form combinations of 
order n of these m elements and calculate how many there are. As was the case for the 
variations, each element considered individually constitutes a combination of order 1. 


1 7) 3 4 5 ae m. 


Hence, C, ,=m. 
m1 


The combinations of order 2 are formed by adding each and every one of the 
subsequent elements to the right of the combinations of order 1: 


12 13 14 15 16 ae lm 
23 24 25 26 ae 2m 

34 35 36 a 3m 

45 46 fee 4m. 


Continuing with this process of adding all the elements after the last element of 
which they are made up to the right of the combinations gives the following order 


3 combinations: 


123 124 125 126 127 oe 12m 
134 135 136 137 = 13m 

234 235 236 237 oo 23m 

145 146 147 = 14m 

245 246 247 ae 24m 
345 346 347 Rs 34m. 


We proceed in the same way to form the combinations of order 4, 5, etc. Consider 
all the order 3 combinations of 5 elements (1, 2,3, 4 and 5): 


123 124 125 134 135 145 
234 235 245 345. 


Taking each of these combinations and using it to form all the possible 
permutations of its elements gives the order 3 variations of the 5 elements. Consider, 
for example, the first combination. Let's write out all its permutations. This gives: 


123 132 213 231 312 321. 
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Hence, the number of order 3 variations of 5 elements can be expressed as follows: 


Vi5=C5 Ps 
from which we can deduce: 
V, : 
(Sg ee 
Pio S256 


We can generalise this result to find C,, : 


m! 


Cc Von _ (m= 1)-...(m—n+1) _ (m=n)! _ m! 
canbe 22 n-(n=1)+...23-2-1 nt! n!-(m—n)! 


Returning to our original examples, there are the following different ways of 
selecting 4 out of 22 students to play the flute: 


The number of different cups of ice cream with two different-flavoured scoops 
would be: . 


The model for putting objects in boxes can also be used. 


— In the first case, we must distribute 4 equal objects (the flutes) into 22 differ- 
ent boxes (the students); the second entails distributing 2 equal objects (the 
scoops) in 5 different boxes (the flavours). 

— In both cases it is not possible to put more than one object in each box, 

— The order in which the objects are placed in the boxes does not matter. 


We say that the different ways of putting m identical objects into n different 
boxes, such that each box contains no more than one object, are also combinations m 
elements choose n. 
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Where there is the possibility of repeating elements, we have combinations with 
repetition. Given a set of m elements the combinations with repetition of order n are all the 
groupings of n of the m elements, in which elements may be repeated and where two 
combinations are different if they differ by one element. This is written as CR,,, 

The process for forming the combinations with repetition is the same as that 
for ordinary combinations, although we must keep in mind that upon going from 
one order to another, in addition to the following elements, we must also write the 
element itself to the right of the last element. 

The deduction of the number of combinations with repetition is complicated, 
and we shall limit ourselves to providing the formula for its calculation: 


-1)! 
CR =O... = fntn—1 
: “ nk(m—1)! 


In the case of the ice creams, applying the formula above gives the following 
different cups with two scoops (they may have the same flavour). 


As might be expected, they are the 10 cups with two flavours we had already 
discovered, in addition to the five cups with two scoops of the same flavour. 


DISTINCTION BETWEEN COMBINATIONS AND VARIATIONS 


To distinguish between combinations and variations, the following situation may be of some 
use. We have 12 different colours and we wish to make: 


a) Tricolour flags with horizontal bands. 
b) New shades mixing three different colours. 


How many possibilities are there in each case? In the second example, the order is not im- 
portant, since there will be no difference in the end result. These are the combinations of 12 
colours choose 3 (220 different combinations). For the flags, we must consider the position 
of the bands and hence in this case order does matter. They are the variations of 12 colours 
choose 3 (1,320 different variations). 
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Combinatorial numbers 


The combinatorial number of order n is the number of order n combinations of m 


elements. It is written as 


and read as ‘m over n’. According to the definition, we have: 


Ee or me iar 
n ™ nk(m—n)! 


The upper value, m, is often referred to as the numerator, and the lower, n, the 
order of the combinatorial number, where m2n. For example: 


5)__5!__5-4-3!_20_1, 
Cy Belo ae 

ae (} 10! _10-9-8!_ 10-9 _ 4. 
2) 28! 2-8! 2 


Properties 


There exists a series of properties that hold for combinatorial numbers: 


1. A combinatorial number of order 0 has the value 1: 


14> i! Shee 
0) Of(m-0)! mm! * 


2. All first-order combinatorial numbers are equal to their numerator: 


m|_ m! = t(m-1t 
1) 1(m—1)! — (m—1)! f 
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3. All combinatorial numbers whose order is the same as their numerator are 


equal to the unit: 


4. Complementary combinatorial numbers are numbers whose orders add up to 


(c}e=(.2.]: 


Two complementary combinatorial numbers have the same value: 


the numerator: 


5. The sum of two combinatorial numbers with the same numerator and 
successive orders is another combinatorial number the numerator of which is 
one more than the numerators of the combinatorial numbers added together, 
and whose denominator is equal to the greater of their orders: 


(eeak): 


Consider the following examples: 


= 4,950. 


100) _ 100 100! 100-99-98! _ 100-99 
2 2!98! 2-98! 2 
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POKER 


In poker, there are many different hands (five cards selected without replacing them from a 
pack of 52 cards) with different values, The harder a hand is to obtain, the more it is worth. 
The number of possible different hands is the number of combinations of 52 cards choose 5: 
Cay {?| = P05 = 2,598,960. 
That's not bad: more than two and a half million possibilities. 
How many four-of-a-kind hands are there (four cards of the same rank and another that is 
different)? Since there are 13 different numbers for each suit, there are 13 possible cases, and 
in each of these, the other card can be any of the remaining 48 cards from the pack. Hence, 
the number K of four-of-a-kind hands is K= 13 -48=624. 
For a full house (three cards of the same rank, and two of a different rank), there are 13 pos- 
sible ranks for the three cards, each of the possible ranks being multiplied by each of the three 
possible suits from which it can be formed: C,, ,-C,,=13-4=52. For the other pair, which 
must have a different rank from the three cards (there cannot be two more cards of the same 
rank, and if there was one, we would have a four-of-a-kind), there will be 12 possible pairs 
of two suits, hence 12-C,,=12-6=72. There is one pair for each of the possible three cards, 
hence the total number F of possible full houses is F=52-72=3,744. A small number, as we 
can see, but much greater than the number of possible four-of-a-kinds: six times greater! 
In the case of a straight flush (five consecutive cards with the same suit), since there are 
13 cards for each suit, it is possible to obtain nine different flushes for each, a total of 36 
straight flushes. 
Once we have seen the possibilities for each of the hands (for the first player to receive their 
cards and then varying for the others depending on those received by the previous players) 
it is possible to better understand the value of the moves: the scarcer they are, the higher 


their value. 


Pascal's Triangle 


When we need to obtain a complete series of combinatorial numbers, such as: 


HUONG 
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the easiest way of calculating them is to construct Tartaglia’s or Pascal’s Triangle all 
the way down to the corresponding row — this can now be done using computers 
or calculators. In spite of its name, the existence of this numeric triangle dates back 
many more years, to ancient Indian civilisations (2,000 years before Pascal) and China 
(1,700 years before Pascal). The French mathematician made wide use of the triangle 
in the calculation of probabilities, which is why it has been named after him. It is 
an easily obtainable triangle of numbers: all the rows start with and end in 1; the 
first is made up of two ones; each of the intermediate terms in the lower rows are 
obtained by adding the two numbers from the row above immediately to their left 
and right. And of course, the numbers of row n correspond to the combinatorial 
numbers with numerator n. 


eos oe neds ras ieai 


In this way, we can easily obtain the values of the series of combinatorial num- 
bers in which we are interested. In our case, the fourth row shows us that: 


(es Gee Ce Gee 


Comellas’ labyrinth 


Sometimes the counting of possibilities is used in unexpected contexts, with peculiar 
applications and surprising results. In the book Notions of Prosody and Its Applications 
to the Metric Art, by Bartolomé Comellas, published in Palma de Mallorca in 1876, 
the following example is stated in the chapter titled Labyrinths. He figured out how 
many ways it was possible to arrange the phrase “Dios esta por todas partes’ (God is 
everywhere) into triangles with “Ds’ and ‘Ss’ forming the hypotenuses. 
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\ 
) 


# D 
Dpto 
a pios *%, 
DIOSE 
DIOSESs 
DIOSEST 
prosgesti 
piosgestisg 
prosestTiEgN 3 
estizwTovasPartress 
ODASPARTES & 
DASPARTES 
ASPARTES 
SPARTES 
PARTES 
ARTES 
RES 
TES 
Es 


\ 


Comellas gives the solution, precisely 1,024 = 1,048,576 times. More than one 
million ways! We leave it up to the reader to find them or check the author's claim, 
although we offer the following hints: 


“In the labyrinth there are various allegorical or symbolic ideas. The 
triangles represent God, Trinity and the Righteous. 

“On the hypotenuse of the first, the largest side of the triangle, is the name 
God, and additionally, in the centre of the whole figure, the first letter of the 
equivalents of the Greek letter with which Theos, God, and the Graeco-Latin 
names, Theologia Theodicea, which refer to Him begin. 

“The circumference or crown of stars, which contains the attributes of 
God, represent Eternity, which was represented by the ancients by means of a 
coiled serpent biting the end of its tail, and the immensity and omnipotence 
of God the Creator. 

“The circle represents Catholicism. 

“The legs (short sides), which form a cross, represent Christianity as the 
true religion or cult of God. 

“The two triangles in the form of an hourglass are regarded as fixed and 
inclined, turning on their axis, and represent the time that does not pass 
through God, or his presence and coexistence. 

“The moral impossibility of reading all the shapes of the labyrinth, without 
repeating them or confusing them, represents the incomprehensible and 
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inexplicable nature of the mysteries of religion, as one of the characters of 
the infinite and the divine.” 


Mozart's musical dice game 


Wolfgang Amadeus Mozart (1756-1791) invented a game which, with the help of two 
dice, made it possible to compose 16-bar musical pieces without an understanding of 
music or composition. His Musikalisches Wiirfelspiel is a musical composition generator. 
It does not contain a score for a small piece with 16 bars, but is a system that 
uses the results of throwing two dice to generate a vast range of different musical 
compositions, each with 16 bars. Mozart wrote 176 numbered bars, from 1 to 176, 
and organised them in a table with 16 columns, each of which had 11 rows with 
one bar. The procedure for generating an individual ‘piece’ of music using the dice is 
as follows: the two dice are thrown 16 times (once per column), and on each throw 
the bar whose rows correspond to the sum of the numbers on the dice is selected 
(11 possible results, from 2 to 12). 


[ [ite [stats et 7 pedis jets 5 [16 
[pay 26 | 22 [an] a1 [ros | r22] 11 | 30 | 70 [21] 26 | 9 [r12] 49 [r09] 14 | 
[3 [22 [5 |v2e| es [rae] a6 [150] a1 [117] 39 | v26] 56 [17a] te [16] 6s | 
[4 [62 | 95 [ise] 13 [asa] ss [110] 26 [66 [139] 15 [132] 73 [ 58 [145 79 | 
[ria] 05 [ver] 2 [1s9[ 100] 90 [176] 7 | 34 | 67 | 160] 52 | 170] 
63] 45 [ 80 | 97 | 36 | 107] 25 [143] 64 [12s] 76 [136] 1 | 93 | 
[67 rsa] 6a [118] 91 [138] 71 [150] 29 [100 [162] 23 [151] 
[gf v52[ 60 [ivi 3 | 99 | v33] 21 [127] 16 [oss] 57 [175] 43 [ree] a9 [172] 
[a [ri] 24 | 4] so [rao] a6 | v69] 94 [120] ee | ae [r65] se [ns] 72 [111] 

[29] 62 [123] 65 | 77 | 19 | s2 [137] 38 | 49] | 


=] 


Without going into details, since some bars are the same despite being identified 


by a different number, the number of possible compositions is 11'*! If all the possible 
scores are performed, and each performance lasts just 30 seconds, it would require 
more than 40 billion years with the music being played continuously to exhaust all 
the possibilities. Scientists estimate that the Big Bang (the start of the Universe as 
we know it) took place between 13 and 15 billion years ago, and that our Sun will 
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last for another 5 billion years. Hence, in order to perform all the pieces that could 
be generated using the dice, we would need to colonise other solar systems. We have 


enough music to last us for a long time! 


Queneau and combinatorial poetry 


Raymond Queneau (1903-1976), poet, mathematician, and member of the Oulipo 
group, carried out an experiment similar to Mozart but using sonnets. He wrote 
a book entitled Cent milliards de poémes (A Hundred Thousand Billion Poems). As the 
author states in the introduction, “This small work makes it possible for everyone to 
compose millions of sonnets, all of which are regular and understandable.” 

It is a small book: ten pages, each with a sonnet. However, each page is divided 
into 14 strips, each of which contains a line of the sonnet that appears on the page. 
Combining the horizontal strips, it is possible to make 10'* sonnets. If we spent 
half a minute reading each of the sonnets (an extremely fast pace), leaving no time 
for changing sonnets, we would need 95 million years to read them all (without 


stopping or resting). 


Copy of A Hundred Thousand Billion Poems, a book 
whose title refers to the number of sonnets that can be 
made by combining its lines. 


The example of the sonnets, together with Mozart’s musical idea, gives us an 
idea of the number of groups that can be obtained, starting with just a few elements, 
making clear the requirement for suitable strategies to count all the possible situations 


that may arise from an experiment. 


33 


Chapter 2 


The History of Probability 


The beginning of mathematics, as we understand it today, dates back to classi- 
cal Greece, like practically all of our culture. The foundations of what was later 
to become mathematics date back some 2,300 years ago, to Euclid’s work the 
Elements of Geometry, one of the greatest best-sellers in history, and not only in 
terms of scientific literature. Euclid’s aim in writing the book was twofold — on 
the one hand to bring together the mathematical results known at the time (to 
have a sort of encyclopaedia that could be used as a textbook) and, on the other, 
to obtain a model of how to prove results and build a mathematical theory, with 
axioms and rules for deduction. Hence, he managed to separate mathematical 
reality from the surrounding physical reality. Starting from a few ‘self-evident’ 
results, certain predetermined laws were used to arrive at new truths. The whole 
edifice was supported by the axioms, such that changing them would give rise 
to a new mathematics. This occurred in the 19th century, when one of the least 
evident postulates, the fifth one, which states that “for a given point, it is only 
possible to draw one line parallel to a given straight line”, was called into ques- 
tion. Refuting it gave rise to other geometries, referred to as ‘non-Euclidean’. 

The final objective of Greek mathematics, the highest form of which was 
geometry, was to find truths or certainties. This is why it did not follow the 
most suitable path for discovering results related to uncertainty. Put another way, 
if what we were trying to prove, based on the few axioms (accepted without 
proof), resulted in a chain of certainties the results of which were indisputable, 
the ancient Greeks were going in the opposite direction from the one they 
would need to follow in order to deal with chance. They insisted on finding the 
absolute truth and were against all uncertain statements. 

This is why there is nothing related to probability in the Elements or any 
subsequent Greek book. There was an insurmountable mental block produced by 
the mainstream point of view, which existed in spite of the fact that the Greeks, 
like other previous and contemporary civilisations, were passionate about games, 
particularly those that made use of knuckle bones (actually the hock bones 
of a sheep or goat) and dice, as archaeological findings have shown. However, 


35 


THE HISTORY OF PROBABILITY 


there were other problems, too. The Greeks believed the will of the gods was 
revealed by various procedures, including the results of throwing knucklebones, 
such that if a given result came up, it was the express desire of the gods, and it 
was meaningless to try and understand what was going to happen next time, 
or fathom randomness. This is apparent in the writings of Socrates and Plato, 
Furthermore, the Greeks suffered from another drawback that made it almost 
impossible to tackle this issue. Their numbering system was ill-suited to it and made 
the required calculations very difficult (although it did not impede the study of the 
issues in which they were interested, such as the properties of numbers or different 
types of numbers, such as primes, perfect numbers, amicable numbers, polygonal 
numbers, etc.). It is well known that the Roman numbering system was somewhat 
lacking when it came to carrying out calculations as well, but the Greek’s was even 
worse. They used letters to represent numbers, and since they had 24 letters, the first 
nine symbolised the numbers from 1 to 9, the next nine, the tens from 10 to 90, and 
the remaining six, together with another three symbols, the hundreds from 100 to 
900. Nor did they have a zero, an ‘invention’ that came later from Indian civilisation. 


These drawbacks made calculations difficult. 


A small painted terracotta sculpture from Greece, dated to 
between 340 and 330 sc, depicting two women playing 
knucklebones. The small bones appear in their hands. 
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An additional difficulty lay in the fact that the instruments of chance that 
they used were not regular. A ‘knucklebone’ has six faces, but only four of these 
are stable surfaces that allow it to come to a rest. These four, with variations 
according to the animals from which the bones came, had probabilities of around 
40% for two of the faces and 10% for the other two. 

In Roman times, the focus of mathematics shifted, in spite of the fact that 
Greek culture formed the basis of Roman thought. For the Romans, the most 
important aspect of mathematics was not truth or beauty, which had preoccupied 
the Greeks, but its use for measuring, counting and calculating, using it to 
live more comfortably and achieve military superiority. It was no longer one 
of the most important parts of knowledge and became a practical tool. This 
is surely why the Romans failed to bequeath posterity with the name of a 
distinguished scholar in this field (in contrast to the plethora of brilliant Greek 
mathematicians whose fame continues to this day: Pythagoras, Thales, Euclid, 
Diophantus, Archimedes, etc.). However, the Romans did use it as a tool for 
developing the techniques required for building the impressive public works that 
spread throughout their extensive territory, many of which are still standing in 
Europe, western Asia and North Africa. 

This is why, in spite of believing that instruments of chance were a method used 
by the gods to reveal their wishes, they began to deal with probability. In fact, Cicero 
wrote that “probability is the very guide of life”, leading him to question whether 
the result of throwing dice depended on the direct intervention of a god. This caused 
him to question astrology, a belief system that was deeply rooted at the time and 
that continues to have many followers today, as shown by the fact that horoscopes 
are still included in the majority of newspapers. At any rate, Cicero left us the word 
‘probability’ (derived from probabilis). 

Throughout the Middle Ages, chance remained unstudied. As before religious 
influences did much to limit the development of thought in this period. The 
overriding attitude was that God was present everywhere: “Some causes are known, 
others are not, but nothing happens without a cause”, meaning that nothing is 
random, there is nothing that is a product of chance. The conviction that all events, 
whether important or trivial, occurred as a result of divine providence, represented a 
serious obstacle for the development of the calculation of probabilities. For example, 
in the 13th century, following this line of thought, King Louis XI of France banned 
not only games of chance but also the manufacture of dice, equating them with 
other vices, such as frequenting taverns and fornication. 
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THE EARLIEST GAMES OF CHANCE 


We know from various sources (paintings, pottery, writings) that hock bones were used by 
many ancient civilisations, including the Egyptians, the Greeks and the Romans. Archaeologi- 
cal excavations dating back 40,000 years have revealed a proportion of ankle bones up to six 
times higher than other bones, leading us to believe that humans used them in games even 
then. Plastic figures, known as Go-Go Crazy Bones, are a popular modern derivative of this 
ancient game. 

One of the most commonly used man-made instruments for games of chance is the cubic die, 
The oldest known sample is ceramic and was discovered in the north of Iraq, It has been dated 
to the start of the third millennium sc. The dots are arranged differently from modern dice 
(according to which the opposite faces add up to 7), as shown in the figure: 


Precursors to probability 


The first glimmers of what would later become probability come from the great 
figures of the Italian Renaissance, including Tartaglia, Peverone, Galileo and Cardano. 
Their arguments appear in the context of games, as in the so-called ‘problem of 
division or distribution’. In 1494, Luca Pacioli (c. 1445-1517) stated the problem as 
follows: “Two teams play pelota in such a way that they need 60 points to win the 
game, and each ‘goal’ is worth 10 points. The bets are 10 ducats. An incident occurs 
which makes it impossible to end the game and one team has 50 points and the other 
30.We wish to find out the share of the prize money that corresponds to each team.” 

Niccolé Fontana, known as Tartaglia (1499-1557), argued the solution to the 
problem as follows: “Supposing it is necessary to reach 6 goals, and A has already 
scored 5 and B has scored 3, in my opinion, the fairest distribution is 2 to 1, since 
A is two goals ahead of B. This is one third of the total number of goals required to 
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ESS ee ee ee ee 
Dice with dots arranged in different positions have also been found from the pharaonic era 
of Egypt. Herodotus provides us with an account of how a period of famine was alleviated 
in ancient Libya, around 1500 8c: People played dice non-stop for a full day to avoid feeling 
hungry, whereas the following day they would eat and did not play dice. His account states 
that around 18 years passed in this manner! 

In Greece and-Rome, games of chance constituted a genuine passion. Homer states that, as 
a child, Patroclus became so annoyed with an opponent when playing knucklebones, that he 
almost killed him. In Rome, knucklebones became so popular that at certain times, laws were 
enacted to prohibit them, representing the start of a long history of prohibition when it comes 
to games of chance. The Roman emperor, Claudius, was such an enthusiast of the dice that he 
often played them while travelling, and even wrote a book on them. 

The origin of playing cards is more recent, although has suffered from the same bad reputation 
as other instruments of chance. Their exact origins are unknown, although theories abound. 
Nevertheless they represented a genuine innovation in terms of the games on mediaeval times. 
The earliest documentation of card games in Europe dates back to 1376 when they were 
prohibited in the city of Florence. In spite of the fact that historical samples that have been 
conserved are rare due to their fragility, the adoption of this game can be traced by following 
the prohibitions to which it was subjected in different places throughout Europe. 


win. Hence, A should receive one third of the bets. The remainder should be divided 
equally, giving A an advantage over B in the proportion 2 to 1.” However, Tartaglia 
did not back his reasoning fully, acknowledging that, “The resolution of this issue 
should be more judicial than mathematical, such that, regardless of the way in which 
the decision is made, there will be grounds for a dispute.” 

In 1558, the Italian Giovanni Francesco Peverone (in his small book Due brevi e 
Sacili trattati, il primo d’Arithmetica V’altro di Geometria) provides a more correct solution 
to the problem: “Suppose that A only needs to score another goal to win the prize 
and bets one unit. If B also has one bet, they will also bet one unit. Hence, the prize 
should be divided equally. If B has two plays, they must pay two units more to reach 
the position that leaves them with just one play. Hence, the prize must be divided 3 
to 1. If B has three plays remaining, they must pay two times more once again, and 
so on. Hence, in Pacioli’s problem, the prize would be divided 7 to 1.” 
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[ ea 
THE PREHISTORY OF PROBABILITY : 


In order to study probability, it is useful to have tackled problems from combinatorics. By means 
of a brief historical overview, such problems already appear in the famous Chinese book / 
Ching (Book of Changes), 1200 8c, with its famous combinations of mystic trigrams. For their 
part, Greek philosophers occasionally considered problems now solved using combinatorial 
calculus, although they did not have theories on the issue. The Latin thinker, Boéthius (fifth 
century) also provided a detailed description of a rule for finding the combinations of n 
objects choose 2. Similarly, the Tudelan astronomer A, Ben Meir Ibn Ezra (11th century) and 
the Catalan Jew Levi Ben Gerson (14th century) studied rules for the calculation of variations 
and combinations, and were able to calculate combinatorial numbers. In the Middle Ages, 
combinatorial situations were associated with the alchemist magic of the signs. That is why the 
Majorcan thinker and alchemist Ramon Llull (13th century) is often mentioned as the founder 
of the theory of combinations, since he wished to represent all the elements of the Universe 
using true signs, and generate all their combinations to produce true signs for all possible 
compounds. Galileo enumerated the 216 different ways in which three perfect dice could land, 
although he did not use combinatorics, instead basing his results on arithmetical methods. 


es) 


The invention of movable type resulted in the appearance of books on the 


various games that were popular at the time, ending the need to pass on rules 
verbally. Girolamo Cardano (1501-1576) wrote Liber de Iudo aleae, the first book 
related to the world of chance. Its goal was to calculate the different possibilities 
for throwing various dice, in addition to solving problems related to the division 
of groups. As it lacks an appropriate notation, it recurs frequently to specific 
examples. Throughout the course of the book, the current results for the union and 
intersection of events are not used, although two methods are used: counting the 
different possibilities and the concept of average earnings. Somewhat curiously, the 
book begins with some moralising advice on the dangers of games. Cardano used 
concepts from what is now known as the classic definition of probability, although 
he did not define them. He introduced the idea of assigning a number (probability) 
p between 0 and 1 to an event whose result is unknown, considering the total 
number of results and the number of favourable results. He also came across what is 
now known as the ‘law of large numbers’, stating that if the probability of an event 
is p, after a large number of repetitions N it is reasonable to bet that it will occur 
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around Np times. However, Cardano did not recognise the theoretical importance 
of these concepts because he regarded the relations as merely arithmetic, instead 
of as a measurement of the possibility of a random event occurring. 

Later on, Galileo Galilei (1564-1642) would return to some of the problems 
stated by Cardano and provide solutions. Between 1613 and 1624 he wrote a work 
dealing with the matter, which appears in his selected works published in 1718 
under the title, Considerazione Sopra il Gioco dei Dadi. It includes the following 
problem:‘When a balanced die is rolled, a score of 1,2, 3, 4,5 or 6 can be obtained 
with equal probability. When two dice are thrown, the sum of the scores obtained 
will be between 2 and 12. Both 9 and 10 can be obtained in two different ways 
from the numbers 1, 2, 3, 4, 5, 6: 9=3+6=4+5 and 10=4+6=5+5. In the 
problem with three dice both 9 and 10 can be obtained in various ways, which are 
as follows: 10 can be obtained from any of the events {(1,3,6), (1,4,5), (2,2,6), (2,3,5), 
(2,4,4) and (3,3,4)}, whereas 9 can be obtained from {(1,2,6), (1,3,5), (1,4,4), (2,2,5), 
(2,3,4) and (3,3,3)}; in both cases there are 6 favourable events. How is it possible 
that upon throwing the three dice many times, the sum of 10 appears much more 
than the sum of 9?” 

To solve the problem, Galileo carried out a careful analysis of all the combined 
scores that could be obtained by throwing three dice, leading him to conclude there 
were 216 possible cases. Of these, 27 add up to 10, and 25 add up to 9. His reasoning 
is noteworthy, similar to that which is currently used, leading us to remark that the 
concept of ‘equally probable’ faces of a balanced die was known as far back as the 
16th century. 

However, Galileo’s main contribution to probability theory was the creation 
of the theory of error measurement. Galileo believed errors of measurement were 
inevitable and that there were two types: ‘systematic’, which were the result of the 
methods and tools used in the measurement; and ‘random’, which vary unpredictably 
from one measurement to another. The classification continues to be widely used and 
the idea means that Galileo did not only make a contribution to the development 
of probability theory, but lay the foundations for the birth of statistics. 


The beginnings of probability theory 


There were many more learned forerunners, but there is almost complete unanimity 
that the birth of proper probability theory occurred in the correspondence between 
Pascal and Fermat in their attempts to solve the problems proposed to the former 
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STATISTICS 


There are no works on probability and few allusions to it, although there are records of 
statistical exercises carried out in different civilisations. In China, as long ago as 2000 8c during 
the Hsia Dynasty, censuses were already being carried out, and during the Chow Dynasty 
(1111-211 sc) there was an administrative figure responsible for this work. Similarly, in the 
Roman Empire the censor was an important figure responsible for censuses, as suggested 
by the name. In India, references to the use of statistics date back to the 4th century, and 
they can also be found in the Old Testament and Eaypt around the time of the Pharaohs for 
fecording the water level of the River Nile. However, it was not until the 17th century that 
John Graunt (1620-1674) carried out his mortality forecasts, relating statistics and probability, 
something that has survived to the present day. 

In Spain, the first statistic census was carried out by Ferdinand the Catholic in 1495 to 
administer the hearth tax of the Kingdom of Aragon. The tax received this name because 
it counted the number of hearths or homes in the land, an important figure for raising 
taxes and recruiting an army should the king of France attack. The careful accounting of 


the populations of places and the jobs of individual inhabitants, as well as the verification 
of the poor (those unsuitable for productive labour), provided valuable information about 
productivity and the state of the population at the time. (In addition it made it possible to 
know the location and absolute and relative numbers of Christians and Muslims in the land.) 


by the Chevalier de Méré. It appears that around 1652, the strict and spiritual Blaise 
Pascal (1623-1662) met the worldly gambler Antoine Gombauld, known as the 
Chevalier de Méré (1607-1684), one of the many nobles with a passion for dice 
and card games. He was almost a professional gambler, a learned and intelligent man, 
someone who understood that reflecting on games and having a better understand- 
ing of them would give him certain advantages. In their conversation, he proposed a 
series of problems to Pascal, which captured his attention and on which he worked 
with Pierre de Fermat (1601-1665). 

The correspondence between them saw the meeting of two great minds and gave 
rise to the calculation of probabilities in earnest. It should be noted that Fermat 
and Pascal, in spite of the depth of their scientific relationship, and both being French 
— living around 600 km apart in Toulouse and Paris — never met in person. Their 
relationship was conducted purely by letters, underlining the powers of contemporary 
communications and the contribution they have made to science. 
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Portraits of Pierre de Fermat (left) and Blaise Pascal 


The three problems that De Méré proposed to Pascal and that have provided so 


much ‘play’ for posterity were: 


1. Suppose two players, A and B, take part in a £60 bet. They agree that the 
first to obtain 3 points will win the bet. However when A has obtained 2 
points and B has obtained 1, they decide to stop the game by mutual 
agreement. How should the £60 bet be distributed? 

2. Ina game in which 3 dice are thrown, who has the greatest possibility of 
winning, the person who bets on number 9 or the person who bets on 
number 10? 

3. Is it advisable to bet that at least one 6 will appear if a die is thrown four 


times? 


We have already considered the first problem, albeit in a slightly different form. 
In terms of the second, De Méré confessed to Pascal that his intuition told him it 
was better to bet on 10 than on 9, although he was unable to give a clear explanation 
since the number of different ways of writing 10 and 9 as the sum of 3 numbers 
(between 1 and 6, the possible results of a die) was the same. In fact there were six 


possible sums for each case: 
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9=14+246=14345=14444=2424+5=24344=34343; 
10=14346=1+445=24-24+6=24+34+5=244+4=3+3+4. 


De Méré’s intuition was founded on the fact that if we carry out simple 
calculations on the favourable circumstances in which a 9 or 10 occurs: 


— The probability of winning betting on 9= 25/216. 
— The probability of winning betting on 10= 27/216, 


Hence, there is a small difference of just 1/108, in favour of 10 over 9, meaning 
that if the dice are thrown just once, the difference is almost irrelevant, although 
this is not the case if played systematically. There can be no doubt that De Méré 
possessed the intuition of a great gambler, doubtless the product of his considerable 
experience. 

In modern language, Pascal's solution to the third problem was that the probability 
of a 6 not appearing when a die is thrown is equal to 5/6. As all the throws are 
independent (one result does not influence another) the probability that a 6 does not 
occur when a die is thrown four times will be (as we shall see in the next chapter): 


P (no 6) =5/6-5/6°5/6:5/6 = 54/64 = 671/1,296 =0.518=51.8%. 


The probability is just over 0.5, making it favourable to bet that a six does not 
occur; however, many repetitions are required to be able to appreciate the minor 
difference between 51.8% and 48.2% of obtaining at least one 6. Once again it 
is possible to appreciate the perspicacity of a compulsive but intelligent gambler. 

To see how Pascal and Fermat solved the problems, we'll analyse the distribution 
problem studied by Pacioli and Cardano a century before. It was not just an 
improvisation, since Pascal had been thinking of the problem for two years before 
communicating it to Fermat. In one of the first letters of those they exchanged 
(which spanned the course of another two years) Pascal recounts his meeting 
with De Méré to Fermat and provides his solution to the distribution problem, 
giving us a clear idea of how he proceeded: “I have here an approximation of how 
I determine the value of each of the games when two players play, for example, 
three games, and each has bet a total of 32 coins. Assume the first has two points 
and the other one. They now play a game in which if the first wins, they win all 
the money at stake, specifically 64 coins; if the other wins, it is two points against 
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two points and, consequently, if they wish to conclude, it is necessary for each to 
take what they have pledged, 32 coins for each. Consider, sir, that if the first wins, 
they will have 64; if they lose, they will have 32. Now, if they do not wish to risk 
the game and part without playing, the first must say:‘I am sure of having 32 coins, 
because I will have them even if I lose; however, the other 32 may be either mine 
or yours; the chances are the same, hence let us share out these 32 coins half and 
half, and you will also give me the 32 coins that are indisputably mine.” Hence, 
he will have 48 coins, and the other 16”. 

The letter concludes with the well-known phrase “the Chevalier de Méré is very 
talented but he is not a geometer [mathematician]; this is, as you know, a great fault”, 
an example of the high esteem the profession held itself in, shared by its members 
— but not the general public. 

Almost simultaneously, Fermat solved the problem by means of a completely 
different method, which also generalised the solution, something that was greatly 
encouraging to Pascal:“*You can now see”, wrote Fermat, “that the truth is the same 
in Toulouse [where Fermat lived] as in Paris.” 

As ‘collateral benefits’ of all this reflection, Pascal carried out a series of 
combinatorial studies, published in 1665 in his Treatise on the Arithmetic Triangle, the 
most important contribution and systematisation in the field of combinatorics that 
had been carried out to that date. The book begins with the construction of what 
was from then on known as ‘Pascal's Triangle’, which we have already seen. 

Around 1655, the Dutchman Christiaan Huygens (1629-1695) came into 
contact with the ideas of Pascal and Fermat through Roberval, who was professor 
of mathematics at the Collége Royal in France. Huygens began his work on problems 
related to the calculation of probabilities, which were published in the book De 
ratiociniis in ludo aleae (The Calculation of Games of Chance) in 1657. In addition to 
solving relevant problems related to games, it makes use of and explains the concept 
of the ‘mathematical expectation’ of a variable with a finite number of values. He 
does so by considering the expectancy of human life based on data collected in 
London in relation to rent and insurance claims. He is obviously aware of the 
significance of the calculation of probabilities when he writes: “The reader can 
observe that we are concerning ourselves not only with games, but also with the 
foundations of a new theory, both profound and interesting.” Probability began to 
take off thanks to games, and in the future it would go on to continue its journey 
through other areas of society. It was largely the trio of Pascal, Fermat and Huygens 
who laid the foundations for the theory of probability. 
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PASCAL’S WAGER 


As a result of his reflections, Pascal used probability for what, at the end of his life, became his 
fundamental occupation ~ a religious life and the attempt to prove the truth of the Catholic 
teligion. His argument is known as ‘Pascal's Wager’, and appears in Thought 233 of his Pensées. 
It states the following: "Nobody can unequivocally say whether they accept or reject the doctrine 
of the Church. It may be true. It may be false. It is like tossing a coin: The probabilities are equal. 
However, what about the losses and gains? Assuming we reject the Church. If its doctrine is false, 
we have lost nothing. However, if it is true, we will have to confront infinite suffering in hell. 
Now let us assume we accept the doctrine of the Church. If it is false, we have gained nothing. 
However, if it is true, we obtain eternal beatitude in paradise.” 

This argument has been frequently used to induce people to comply with religious precepts: despite 
the probability that they are true is small, the expected reward for complying with them is infinite 
{eternal glory), meaning the wager is worth it. An intuitive argument of the same type is used in 


social contexts with games that have a small probability of winning but with significant prizes; this 
is the basis of their success in society. The probability of winning a significant prize is small, but 
if we strike it lucky we suddenly become truly rich. The social decision is that it is worth the risk, 
hence the popularity of these games. 


The development of probability theory 


Based on the Pascal—Fermat correspondence, probability theory became part of the 
rapid development of mathematical theories taking place in Europe at the time, with 
the contribution of a wide range of talents. 

It was Jakob Bernoulli (1654-1705) who explicitly stated the foundations for the 
calculating probabilities for various social, moral and economic issues. His work Ars 
Conjectandi (The Art of Conjecture) was published in Basel in August 1713, eight years 
after his death, although he had been working on it since 1685, influenced by the 
work of Huygens. It defined probabilities as the degree of certainty with which a 
future event may occur.The author explained the title of the work as follows:“Let us 
define the art of conjecture, or stochastic art, as the art of evaluating the probabilities 
of things as precisely as possible, such that we can always base our judgements and 
actions on what has been found to be the best, the most appropriate, the safest, the 
most advised; this is the sole purpose of the knowledge of the philosopher and the 


prudence of government.” 
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Pascal's Wager has been debated by many thinkers and philosophers, giving rise to some interesting 
and provocative considerations, such as the following, which are provided for reflection, Diderot 
makes the following objection: "There are many other large religions, such as Islam, which also 
involve salvation through believing in their doctrine. Can Pascal's Wager also be applied to these? 
If so, should we embrace each of them?” 

William James, in his essay The Will to Believe, provides a simpler version in support of the fact 
that the decision to believe in God is a good wager for us, since in the absence of proof of his 
existence either way, one should opt for what will make them happiest throughout their life 
(which he claimed was believing in the after life). 

In the context of the uncertainty caused by the existence of the atomic bomb, H.G. Wells observed 
that we do not know whether the world would be able to survive a nuclear holocaust. However, 
it is necessary to live and behave as if we were certain of survival, because “In the end, if the 
optimism was not justified, at least we would have lived in good spirits.” Pascal's Wager continues 
to be widely relevant to this day. For example, it is a recurring theme in the films of Eric Rohmer, 
linked to chance and predestination, and a reflection repeatedly present in the film My Night 
at Maud's. 


Of the four parts of the Ars Conjectandi, the first three represent a continuation 
of the work of Huygens, a systematic collection of combinatorial results and the 
application of all this to games of chance, along the lines of what had been carried 
out previously. However, the fourth part is essentially different. It considers other 
aspects, proves his theorem of large numbers, and also introduces the important idea 
of confidence intervals. 

Bernoulli considers three types of random situations. Firstly, he studies situations 
related to games of chance, in which probabilities are known in advance, since they 
are determined by the rules of the game and can be modelled using hypothetical 
urns. However, he also adds a second type of situation in which probabilities are 
defined a posteriori — known after a large number of experiments. Bernoulli assumed 
that carrying out an increasingly large number of experiments would provide 
increasingly accurate values of the estimated probabilities. He writes:“Let us assume, 
without knowing, there are 3,000 white marbles and 2,000 black ones in an urn, 
and in order to attempt to determine the number of marbles, we remove one after 
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the other (returning them to the urn afterwards), observing the frequency with 
which black and white marbles appear. Is it possible to do this so often [...] that 
the number of white and black marbles that have been selected have the same ratio 
3:2 as the marbles in the urn, and not a different ratio?” 


NOT SO OBVIOUS 


To understand the difficulty when it comes to dealing with chance, remember that 
distinguished mathematicians, with brilliant careers in other areas, often ‘slipped up’ when 
it came to probabilities. For example, Gottfried Leibniz (1646-1716), one of the greatest 
mathematicians in history, was an enthusiastic player of dice games, which helped him develop 
his theories. He was convinced it was equally hard to obtain 11 and 12 when throwing two 


dice, claiming something that was as evident as it was false. That both results could only be 
obtained by one sum of results (12=6+6; 11=5+6). The truth is that the probability of 11 
is double that of 12, because 11 can be obtained by scoring 5 on the first die and 6 on the 
second, or by scoring 6 on the first and 5 on the second, whereas 12 only comes up once, 
when there is a 6 for both the first and second die. 


Another important result from Bernoulli's work involves repeatedly tossing 
a biased coin, with a probability p of obtaining heads, and q=1—p of obtaining 
tails (hence, p and q, are not 1/2). If the coin is tossed twice, the probability of 
exactly 2, 1 and 0 heads is p®, 2pq and q°, which are the terms of the expansion of 
(p+ q)°=p"+ 2pq+ ¢. Similarly, if the coin is tossed three times, the probabilities of 3, 
2,1 and 0 heads are the respective terms of (p+ q)°=p'+ 3p’q+3q°p+ q’. Generalising 
this result, if we toss the coin n times, the probability of obtaining exactly m heads 
is equal to 


which is the corresponding term of the expansion of (p+ q)". This gives us the ‘bi- 
nomial distribution’. 
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Bernoulli himself was aware of the importance of his result and its applications 
when he wrote: “I attach much greater value to this invention [the extension of the 
theory of probability to different areas of games of chance] than if I had managed 
to square the circle, because if a way was found to achieve the latter, it would be 
of little use.” 

Twenty years after the publication of his book, Bernoulli published a famous essay: 
dealing with Bufton’s ‘Needle Problem’ (included by the Comte de Buffon in his 
Essai d’arithmétique morale, 1733). A needle of length L/2 is thrown onto a wooden 
floor of length L.What is the probability that the needle falls on one of the gaps 
in the wood? Here, geometry appears for the first time in a probability problem. 
Furthermore, the solution to the problem is strangely related to a famous number, 
m! Once again, distant and unrelated areas of mathematics show themselves to be 
closely related. The probability is 1/7, making it possible to experimentally calculate 
the value of % to the required precision by simply increasing the number of trials. 


The behaviour of a needle that falls onto a series of parallel lines, without becoming embedded. 
One of the Count de Buffon’s ideas that mixed geometry and the calculation of probabilities. 


Another significant figure in the development of the theory of probability 
is Abraham de Moivre (1667-1754), born in France but exiled to England as a 
result of one of the numerous cases of religious persecution in European history. 
(In this case, caused by the reversal of the Edict of Nantes by Louis XIV in 
1685, which guaranteed the freedom of religion, forcing De Moivre, who was a 
Huguenot, to leave his country.) He was a member of the Royal Society and a 
close friend of Newton. His book The Doctrine of Chances (of which he created 
various editions depending on the progress of his work) follows the path set out 
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by Huygens and Bernoulli, but applies the ideas of infinitesimal calculus being 
developed at the time. 

De Moivre extended Bernoulli’s work to biased coins. When the number of tosses 
of the coin and the number of heads are large, it is difficult to accurately calculate the 
binomial coefficients, and De Moivre derived an approximate formula that related 
the ‘binomial distribution’ previously discovered by Bernoulli to the error function 
or normal distribution: 


De Moivre was the first to make explicit this connection, which was, as we shall 
see, fundamental to the development of probability and statistics. 

In 1773, Pierre-Simon Laplace wrote his first paper on probability, at a time when 
infinitesimal calculus had already been extensively used. His paper concentrated on 
mathematical aspects, leaving the philosophical foundations of probability, which 
had occupied his predecessors, to one side. Later on, in 1820, in his Philosophical 
Essay on Probabilities, the introduction to the third edition of his monumental work 
Analytic Theory of Probabilities (the first edition of which appeared in 1812), Laplace 
makes explicit his well-known profession of ‘deterministic’ faith: “We must regard 
the current state of the Universe as the effect of its previous state and as the cause 
of that which will follow. An intelligence that, at a given moment in time, was 
aware ofall the forces that act on nature and the respective position of the beings of 
which it is made up, if it were also sufficiently vast to subject all this information to 
analysis, would summarise the movements of the largest bodies of the Universe and 
the slightest atoms in the same formula. Nothing would be uncertain for such an 
intelligence, and both the future and the past would appear before its eyes.” 

However, we should point out that this is not just a case of confirming that a 
higher intelligence would be able to calculate all the effects of the laws of nature, but 
that Laplace’s goal was to develop the science of probabilities in order to gain a greater 
understanding of these natural laws. Hence, he arrived at his famous conclusion that 
“In this essay it has been seen that the theory of probabilities is essentially nothing 
more than common sense reduced to calculations. It makes us precisely aware of 
what the just spirits feel as a sort of instinct, often without them realising.” 

The first part of his two-part work develops the theory of generatrix functions 
and the series used to approximate the expressions of the formulae of large num- 
bers, whereas the second deals with the general theory of probabilities. 
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From Laplace onwards, as well as those immediately following him, above all 
Gauss and Legendre, the theory of probability matured and the remaining task was 
the systematisation, perfection and criticism of the results. This took place throughout 
the 19th century, not just within the realm of mathematics, but also increasingly 
in other fields, the most notable of which was statistical mechanics, which laid the 


path to physics. 


PIERRE-SIMON LAPLACE (1749-1827) 


The French astronomer and mathematician 
is responsible for some of the greatest 
advances in probability, which has been 
popularly honoured by naming one of the 
best-known ways of assigning probabilities 
as ‘Laplace's rule’. Although his family 
wanted him to dedicate his life to the 
Church, Laplace found his true calling in 
mathematics. In addition to his scientific 
skills and capabilities, Laplace showed 
considerable talent when it came to society 
and politics. He obtained a position in the 
military school, where he carried out some 


of his most important research. He became 
examiner of the royal artillery and, as such, 
had the occasion to examine a promising and brilliant young man who was just 16 years old 
and showed great interest in mathematics. His name was Napoleon Bonaparte. 

After the French Revolution, Laplace was able to show his qualities. He was a supporter of 
republicanism and survived the tyranny, in contrast to other scientists who were close to him, 
such as Lavoisier, the father of modern chemistry, who literally lost his head. Later, when 
Napoleon came to power, Laplace put aside his republican ideals and was appointed home 
secretary, although he only lasted a month in the post. He subsequently went on to hold 
various political positions and had the good sense to appoint Napoleon as a member of the 
Paris Academy of Sciences, thus ensuring his support. With the defeat of the emperor and 
the restoration of the Bourbon monarchy, Laplace once again adapted to the times and was 
appointed as Marquis by Louis XVIII. His name appears in the list of the 72 distinguished 
scientists engraved on the Eiffel Tower. 
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There have been many distinguished figures over the centuries, including Poisson, 
De Morgan, Cournot and Tchebycheff, who founded the Russian school and whose 
followers included Markov and Liapunov, paving the way for Kolmogorov, who 
formulated an axiomatic theory of probability, which we shall see in the following 
chapter. Kolmogorov was aware his work represented the culmination ofa long struggle 
against uncertainty, stating that: “The epistemological value of probability theory is 
based on the fact that on a large scale, random phenomena exhibit statistic regularity, 
whereby the randomness, in a certain sense, disappears.” This led the way to the next 
level in the conquest of chance. 

Another figure who deserves a special mention is the Belgian mathematician 
Adolphe Quételet (1796-1874), who developed a special interest in the applications 
of probability and statistics to various human activities, meaning he is now regarded 
as the founder of modern statistics. He compiled social information and described it 
in terms of the normal law, which he called the ‘law of accidental causes’. 

In 1835 he proposed the use of the normal curve for modelling all sorts of social 
information (births, deaths, crime, suicide rates, etc.). He knew that while such 


events are unpredictable at an individual level, they exhibit statistical patterns when 


ANDREI NIKOLAEVICH KOLMOGOROV (1903-1987) 


After having dedicated himself to various oc- 
cupations in the early years of his life, he be- 
gan to study mathematics in Moscow, and as 
a student was already carrying out research of 
international importance. He taught mathemat- 
ics at the University of Moscow and worked in 
various fields, such as analysis and topology, al- 
though his contributions to probability theory 
are of particular importance, together with their 
application to dynamic systems. In 1965, he re- 
ceived the Lenin Prize, the highest civil award in 


the Soviet Union. 
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observed for entire populations. He gave form to this idea by discussing the ‘average 
man’, a term that came to be widely used and depicts a fictitious character who is 
average in all respects. However, Quetelet did not just regard the average man as a 
mathematical concept, but as the aim of social justice. 

In 1844 he astonished sceptics by applying the normal law to the distribution 
of the heights of men, which allowed the discovery of frauds in the ‘evasion’ of 
military service in France. His predictions led him to show that some 2,000 young 
people had avoided being called up by fraudulently stating a lower height than the 
one required to provide military service. 


A recent episode from probability 


In a television competition, when the contestant reaches the final stage, they stand 
in front of three closed doors. Behind one of the doors is a grand prize, such as a car, 
whereas behind the others there is nothing. The contestant must choose one of the 
doors. When they have indicated the door they have chosen, the presenter decides to 
continue the show and opens one of the other two that does not contain the prize, 
giving the contest another opportunity. Obviously they can continue with their 
choice, but they can also leave the door they have chosen and switch to the other 
closed door. Which is best? Does it matter if they stick with the door or change? 

It seems a trivial choice and that it doesn’t matter what the contestant does. 
However, this problem (known as the ‘Monty Hall problem’) has given rise to con- 
siderable debate in recent times, debates in which distinguished mathematicians 
have taken part, revealing that probabilities are far from being a well-understood 
phenomenon, 

From 1963 to 1990, a television programme was shown in the United States 
entitled Let’s Make a Deal, presented by Monty Hall, and the eponymous problem 
is based on this competition. Additionally, for a number of years, in more than 
300 newspapers throughout the same country, a column was published entitled 
‘Ask Marilyn’, containing questions and answers. The woman behind the column, 
Marilyn vos Savant, is famous for making the Guinness Book of Records as the 
person with the highest IQ in the world, with a value of 228. Her fame was 
increased by being the wife of the doctor and scientist Robert Jarvik, who invented 
the artificial heart. One Sunday in September 1990, she published the question 
from the above problem in her column: “Is it better for the contestant to change 
the door they had first selected?” 
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In her column, Marilyn claimed it was better to switch doors. This resulted in a 
flood of letters from her readers (around 10,000) who were almost unanimous: 92% 
said she was mistaken, adding that they were disappointed by the fact that a person 
like her could give the wrong answer to such a simple question. She even received 
letters from many indignant mathematics teachers. They contained comments such 
as:““Let me explain. If one of the losing doors is revealed, this information changes 
the probability of any selection that has been made, none of which have any reason 
to have a probability above 1/2. As a mathematical professional, | am extremely 
concerned at the lack of mathematical ability among the general public. Please 
help by admitting your error and, in the future, take more care.” Or another:“I am 
deeply moved; after having been corrected by at least three mathematicians, you 
are still unable to see your error.” And also:““How many indignant mathematicians 
are required to make you change your mind?” Replies continued to arrive in great 
quantities for a long time until, after dedicating more space in her column, the 
author decided to settle the matter once and for all. 

The scandal was also repeated in Holland for the same reason in 1995, in the 
newspaper NRC-Handelsblad. However, the fact is that vos Savant was actually right, 
and all those who wrote to her, regardless of whether they were mathematicians, 
were mistaken. 


You select a door You select a door You select a door 
with a goat with a goat with a car 
behind it behind it behind it 
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The diagram is provided with the original problem, in which there is a consolation 
prize behind the doors without the jackpot. It shows that if the contestant continues 
with the door they have chosen, their probability, which is obvious (there were three 
doors and they chose just one), is 1/3, whereas if they change, it becomes 2/3, hence 
doubling the probability. 

In addition to the diagram, it is also possible to make use of experiments, 
arranging, for example, three cards, one of which has the prize on it, shuffling them 
and choosing one that is then changed. To obtain a significant number of trials, the 
reader can do this with their friends one evening or during their holidays, and they 
will see that the results confirm these probabilities. Alternatively, they can carry 
out a computer simulation (if they do not believe there is a trick), It is possible 
to find such simulations on various websites, for example search ‘Stick or Switch’. 

It is surprising that there was not only resistance among mainstream mathematicians, 
but also that the Hungarian Paul Erdos, one of the greatest mathematicians of the 
20th century, claimed that the solution was impossible and that he would only 
accept the error after checking it on a computer simulation. 
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Probability and Chance 


In everyday life we are surrounded by information that tells us of the possibilities 
of something happening, the probabilities of winning a prize, strange coincidences, 
the probability of a light bulb lasting for more than 1,000 hours, the possibilities 
of a team winning the league or simply dozens of surveys that tell us what we 
think on a wide range of issues. 

The majority of the examples above are related to facts or events for which it 
is possible to know all the possible results, but the specific result of which cannot 
be predicted when it occurs. These are phenomena or experiments subject to 
‘chance’. However, one of the events that has been mentioned does not fit into 
this category. It is hard to believe that if the best football team in Europe takes 
on a group of random amateurs, the result of the game will be subject to chance. 
Accordingly, here we shall restrict our analysis to experiments or facts that can be 
regarded as ‘random’ or subject to the laws of chance. 

What is chance? A dictionary defines it as a coincidence, a fortuitous case, 
and states that the expression ‘by chance’ means ‘without order’. It is possible that 
if each of us is asked what we understand by the word ‘chance’, we would find 
ourselves without a clear answer and would have to rely on examples of what we 
often call ‘games of chance’. This indicates that chance is difficult to define, despite 
having identified and internalised its meaning, to the extent that we know whether 
we should play a given game of chance and under what conditions. The idea of 
probability is intimately related to the idea of chance and helps us understand our 
possibilities of winning a game of chance or analyse surveys. 

Laplace claimed:“It is notable that a science that started out considering games 
of chance has come to be the most important object of human knowledge.” Two 
centuries later, this statement is increasingly evident, not only in everyday life, but 
also in science, technology and the social sciences. Understanding and studying 
chance is indispensable, because using probability is necessary for making decisions 
in any area. To better understand the situations to which we refer, note that the 
results of certain phenomena that take place around us can be easily predicted, 
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so-called deterministic phenomena. When a deterministic experiment is carried out, 
its result can be anticipated with certainty based on certain initial data. The re- 
sult only changes if the initial data on which it depends has changed. Hence, if 
we allow an object to fall from a specific height, we can ensure it will fall to the 
ground, calculate the speed at which it will meet the ground, or the time it will 
take to do so. 

There are many other phenomena in which this does not occur, but in which 
it is possible to obtain different and unpredictable results from the same starting 
situation. We'll call them random phenomena. A typical example is throwing a die. 
Even if we always try to throw it the same way, the result is always unpredictable. 
The result of a random experiment depends on chance. We might think that by 
measuring a series of data for the initial position of the die precisely, the angle 
and force with which we throw it, the friction of the air, etc. and by knowing 
the equations of motion, it might be possible to predict the result. In this respect, 
Laplace said that probability is the measure of our ignorance. However, what is 
certain is that small variations (which are difficult to measure) lead to different 
results. This is chance. 

However, the fact that a phenomenon is random and means it is also unpredict- 
able. This is where the study of probability, which has gradually come to occupy 
a space in the history of humanity, becomes so important. As can be seen in many 
situations and games, when the same random experiment is repeated many times 
over, numerous regularities appear. It is impossible to determine the result of 
throwing a die, but it is possible to be almost certain of the result of thousands of 
throws. It can be said that:“‘Order appears in randomness with the passing of time 
and repetitions.” As Arthur Conan Doyle (1859-1930), the creator of Sherlock 
Holmes, once said in reference to society at large:““While the individual man is an 
insoluble puzzle, in the aggregate he becomes a mathematical certainty. Individuals 
vary, but percentages remain constant. So says the statistician.” 

The same thing happens with the die and random situations: each throw var- 
ies, the proportions stay the same. When a random experiment is repeated, it also 
becomes clear that, in general, not all results appear in the same proportion: some 
events occur more often than others. If we throw a ‘normal’ die, all the sides ap- 
pear in the same proportion, however if we throw two ‘normal’ coins and examine 
the number of heads that appear (0, 1 or 2), it is more common to see just one 
head. If we take a card from the traditional Spanish pack (40 cards distributed 
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across four suits) and look to see if it is a number card, this does not occur with 
the same frequency. The probability of an event is an indication of the possibility 
that it might occur, 


Definition of probability 
We have already noted that the probability is a number indicating the likelihood of 


observing a result of a random phenomenon or experiment. However, the allocation 
of this probability is something that can be intuitive in many cases but absolutely 
cryptic in others. 


Experiments with statistical regularity 


Let us begin with some ideas and concepts that appear upon repeating a random 
experiment. We'll throw two dice and calculate the difference between the results. 
The table gives us the count of the results, first with 189 real throws, followed by 
the results for computer simulations of 50,000, 100,000 and 1,000,000 throws of 
two dice. 


1,000,000 throws 
166,600 
277,782 


50,000 throws | 100,000 throws 
es ee ee 
0 


5 


13,551 
11,249 


22,513 
16,834 
11,455 


167,562 
110,363 


The absolute frequencies (the number of times each difference is observed) in 


the table do not give us much information. Instead, it is better to take the relative 
quotient of each of the results and the total of the trials carried out, referred to as 
the relative frequency of each. These values, provided in the following table, provide 
us with more information and make it possible to draw clearer conclusions: 
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The table is proof of one of the principles that regulate the behaviour of chance, 
that is, statistical regularity. This means that as the number of times a random experi- 
ment is carried out increases, the relative frequency of each of the results grows 
closer to a specific value. Later, we shall see the theoretical formulation of this prin- 
ciple, which is known as the laws of large numbers. The table represents empirical 
evidence of this. 

The probability of an event is the value approached by its relative frequency 
when the experiment is repeated a large number of times (in mathematics this is 
called the limit value). As we have already mentioned, the probability of an event is 
a measurement of the possibility of it occurring. 

Since the probability is the limit value of the relative frequency, it satisfies the 
properties of relative frequencies: 


1. The probability prob(E) of an event E is a number between 0 and 1, since 
the number of times it occurs must be between 0 and the total number of 
times it is carried out: 0 < prob(E) <1. 

2. The probability of an impossible event is 0.The probability of a certain 
event is 1, These two properties make it possible to establish a scale of 
probabilities. If a result never occurs, its probability is 0; if it always occurs, 
its probability will be 1; if it occurs occasionally, it will be a number be- 
tween 0 and 1, with events becoming more probable as their probability 
grows closer to 1. When something happens all the time, we say it is 
highly probable: its probability will be close to 1. When something occurs 
rarely, we say it is improbable: its probability will be close to 0. However, 
there is an important difference between a high probability (even a very 
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high probability) and certainty: even if an event has a very high probabil- 
ity, it is not possible to ensure it will be obtained in a specific experiment, 

3. The probability of an event made up of various different results is 
equal to the probabilities of the elementary events (results) of which it 
is made up. 

4. The sum of the probabilities of all of the results of a random experiment 
(elementary events) is equal to 1. 

5. The sum of the probabilities of two complementary events is equal to 
1: prob(E) + prob (no E)=1. 


This property is of interest since in many cases it is possible to find the probability 
of an event if we know its opposite (which may be easier to find). For example, the 
event ‘more than one throw of a die is needed to obtain a six’ is the complement 
of ‘obtaining a six with just one throw’. As we know, the probability of obtaining a 
six in a single throw is p= 1/6=0.167, making the compliment: 


prob (needing more than one throw) = 1—0.167 = 0.833. 


It should be noted that we need a ‘large’ number of repetitions to assign probabilities 
using this process of repeated experimentation. That is to say, the aforementioned 
law of large numbers should not be confused with what is sometimes humorously 
referred to as the ‘law of small numbers’, which we apply so many times in our own 
lives by drawing general conclusions from a limited number of experiences. If, for 
example, we have travelled to another country and have been robbed in the street, 
and we are aware of two other cases of travellers who have suffered the same fate (it 
may be the case that they have told us when we tell them about our experience), 
without any more examples, we decide that the probability of being robbed in that 
country is high, although the experience of large groups of travellers and objective 
statistics say otherwise. 

It may be the case that the human brain is predisposed to draw general conclusions 
that allow it to have rules to guide its actions and that the repetition of certain events 
is limited in individual experience, in spite of the fact it is necessary to create rules 
for each action. The reality is that we are quite happy to draw general conclusions, 
making use of this false law of small numbers, and not only in private conversations, 
but also in the media, such as the ‘infallible’ method described by a journalist for 
determining the scale of unemployment: 
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“Ask yourself how many unemployed there are in your family; then ask your 
neighbours, friends and acquaintances, and add up the results you obtain. Compare 
this figure with the number of people that make up your family and those of all 
the people you have asked. This provides an infallible measure of the number of 
unemployed.” 

As can be seen, a limited experience for drawing general conclusions often leads 


to false conclusions. 


Equiprobable events 


It is not always necessary to fall back on experiments to assign the probability ofan 
event. Upon throwing an unbiased die, its properties of symmetry allow us to assume 
that no face is more likely to come up than any of the others. This means that the 
six possible results of throwing a die are equiprobable, or rather the probability of 
each is 1/6~0.167 (each face appears 16.7% of the time). 

Taking a card from the Spanish deck, with 12 face cards out of the total of 40, 
it seems reasonable to believe that a face will appear 30% of the time (12/40- 100). 
Hence, we can say the probability is 12/40=0.30. 

Using these two examples as a reference, we can see that there are occasions on 
which the probabilities of the various results are intuitive. The conditions of symmetry 
and regularity of the die are sufficient to assign the same possibilities to each of its 
sides: 1/6. A similar logic can be used for the pack of cards. 

Hence, in many cases, the probability can be defined as follows: if it is possible 
to obtain N different results upon carrying out a random experiment and it can be 
guaranteed that each has the same possibility of appearing (they are equally possible), 
the probability for each of the results is p= 1/N. 

In the case of the die, the set of all the possible results (referred to as the sample 
space) is E={1, 2, 3, 4, 5, 6}. If the die is unbiased, each of the elementary events 
will be equally possible. Hence, its probability will be: 


p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6=0.167 (16.7%). 


Let’s now consider compound events, such as ‘obtaining an odd number’ or 
‘obtaining a multiple of 3°. What is their probability? If the result of the throw is 2, 
4 or 6, the event corresponds to ‘obtaining an even number’. As the probability of 
a compound event is equal to the sum of the probabilities of the elementary events 
of which it is made up (applying property three above), we have: 
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prob(even number) = p({1,3,5})=p(1) +p) +p(S) =1/6+ 1/6+1/6=3/6 
=0.5 (50%). 


The probability of obtaining a multiple of 3 will be: 
prob(multiple of 3) = p({3, 6}) = p(3) + p(6) = 1/6 + 1/6 =2/6~0.333 (33.3%). 


Consider the results: 


p(odd number) = 3/6, p(multiple of 3) =2/6. 


These two probabilities show that in each case, the probability is equal to the 
quotient of the number of elementary events that make up the compound event 
(3 in the first case and 2 in the second) and the total number of results or elementary 
events that may occur. 

Arguing along the same lines for the general case, we can state that, if N elementary 
events can occur in a random experiment, all of which are equally possible, the 
probability of an event E made up of n elementary events will be: 


p(E)=1/N+1/N+...+1/N (n terms equal to 1/N)=n/N, 


hence: 


€) Number of elementary events that make up event E 
K Total number of elementary events ; 


In other words, the probability of an event E is equal to the quotient of the 
number of favourable results for which the event occurs and the total number of 
possible results. This definition is the famous Laplace rule: “If all elementary events 
are equally possible, the probability of an event E is the quotient of the number of 
favourable cases for E and the number of possible cases for the event.” 


Number cases favorable to E 


pE)= : 
Number of possible cases 
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CARDS 


One tool commonly used to generate 
situations of chance is playing 


cards. There are two main 
types of cards: the Spanish 
and Anglo-American-French 
packs. Both have become 
deeply embedded in the collective 
imagination, since they are, as is the 
case with all great games, models of 
recognisable social situations. 
The Spanish pack of cards is a stylisation 
of mediaeval society, with four suits - coins, cups, swords and clubs ~ representing the four 
fundamental social classes of the time. The bourgeoisie (or merchants, with golden coins); 
the clerics (with the cup for liturgical celebrations); the nobility (with the gentleman's sword) 
and the peasantry (with the club or stick used for manual labour). There are 40 cards in the 
pack, ten for each suit, three of which (jack, knight and king) are face cards. 

The Anglo-American-French pack of cards is a stylisation of the 
«passing of time, specifically the events of a year. The four suits 
LG ca Gu represent the four seasons. Each has 13 cards, giving a 
a6 Pt total of 4-13=52 cards, which correspond to the 52 
weeks of the year. Adding the numbers of each 


*. suit (14+2+3+...412+13=91), multiplying by 
the four suits and adding the joker gives 365 
(4-91+1=365), the 365 days of the year. 


Laplace’s rule or law is regarded as the classic definition of probability on account 
of being the first one known. It is extremely useful for calculating the probabilities 
of compound events in situations of equiprobability (when all events are equally 
possible). The only thing we need to do to find the probability of an event is count 
the total number of elementary events (possible cases) and the number of elementary 
events that make up the event (favourable cases). To count the cases we can use the 


combinatorial techniques we have already encountered. 
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Such is the influence of this definition when it comes to dealing with probability, 
that when all the results are equiprobable for a die, a coin, in roulette, etc. the 
adjective ‘Laplace’ is used. Hence, instead of referring to an ‘unbiased die’, it is 
common to refer to it as a ‘Laplace die’. It is extremely important to ensure that all 
the elementary events of the experiment we are dealing with are equiprobable. (In 
this respect, it is helpful to use common sense, but with caution, because probability 
frequently deceives us) and for which there is a finite number of possible results. This 
last condition may seem ‘strange’ to readers who are unfamiliar with mathematics, 
but a simple example can show us that an experiment can have an infinite number 
of results. 

Imagine that a dart is thrown at a dartboard. If we consider all the points of the 
dartboard as possible results, these are infinite, meaning that the rule above cannot be 
applied. However, if we simplify the experiment and divide the dartboard into four 
squares with an equal area, we have now solved the problem, since there are now only 
four possible results, which are equally possible (assuming the person throwing the 
darts is not an expert and throws it randomly at the board). Yet nor does the division 
into sectors with different areas, as is the case in competitive dartboards, allow us to 
apply the above rule, since not all the results (areas) have the same possibilities. In 
such cases we need to make a slight modification to the previous rule, which consists 
of assigning the quotient of the area of a sector and the total area of the dartboard 
as the probability for that area. 

Laplace’s rule can be applied to the majority of games of chance we can imagine. 
Where this is not the case, it is necessary to specify how the probabilities are assigned. 

Let us consider some examples to help us understand. If we take a card at 
random from a Spanish deck, what is the probability that it is a club? What is the 
probability that it is a face card? And what is the probability that it is a club and 
a face? This random experiment has 40 possible cases, the same as the number of 
cards, all of which are equiprobable in a new, unmarked pack. Of the 40 cards, 
10 are clubs: hence there are 10 favourable cases for the event ‘obtaining a club’. 
The probability is: 


1g 
lub) = — = — = 0.25 (25%). 
p(club) rN (25%) 


Similarly, the probabilities of the other proposed events are: 
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p(club and face)= = =0.075 (7.5%). 


Another example: if we take two cards from the Spanish pack, what is the 
probability that both are coins (one of the suits)? Here, all the groups of two cards 
we can choose have the same probability, and hence the Laplace definition can be 
applied. The possible cases correspond to the ways of extracting two cards from the 
total of 40, the number of combinations of 40 elements (the 40 cards) choose 2: 


There are 10 cards in the coins suit, meaning that the favourable cases will be 
the different ways of choosing 2 of the 10 coins, hence the combinations of 10 


SS | NS ee a 
pee We Je Je Ts | ‘ 


This means the probability that the two cards that are chosen are coins will be: 


elements choose 2: 


Cc 
piwotcoing Ss 8S = = 0.0577 (5.77%). 


40,2 


The probability is the same as if we wished to obtain two cards of any other 
suit (cups, swords or clubs). This allows us to answer another question: what is the 
probability of choosing two cards from a different suit? Here it is easier to answer 
the question indirectly by considering the opposite event. In this case, the opposite 
is that they belong to the same suit (the sum of the probabilities that they are from 
one of the four suits), giving: 


p (different suit) = 1—p(same suit) = 1—[p(two coins) + p(two cups) + + p(two 


swords) + p(two clubs)] = 1 — [0.0577 +.0.0577 + 0.0577 + 0.0577] == 1-[4-0.0577] 
=1-0.2308 = 0.7692 (76.92 %). 
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Until now, we have seen situations in which the probability of an event was 
obtained theoretically, based on a constructed model (darts, cards, etc.). However, 
in the majority of real-life situations it is not possible to make a theoretical model 
and deduce the possibility of each of the events. All this in spite of the fact that it 
is useful to have a probability, which, let’s remember, is a measure of the possibility 
that something might occur. What can we do? 

Let us consider an example. It is a relatively recent practice for newspapers to 
provide weather forecasts in terms of probability (as a percentage). We may now 
find that the forecast of rain for tomorrow is 60%, whereas before we were only 
told whether it would rain or not. What is this percentage telling us? Will it rain in 
60% of the country or 60% of the time? Or has the newspaper asked 10 weather 
forecasters, six of whom have forecasted rain, whereas the other four have not? 

In fact, the answer is slightly more complex. It is the frequency of the past 
occasions on which it has rained under weather conditions that are substantially 
similar to those forecast for tomorrow, taking into account the available information. 
Therefore, we are talking about the level of uncertainty associated with the statement 
‘it will rain tomorrow’ and this is based on complicated calculations carried out 
using a large mass of observed data. In this respect, to talk of tomorrow's weather in 
terms of the probability of rain is much more precise than simply saying whether 
it will rain, since it contains the information required for planning the activities of 
individuals during the day. 

This measurement of probability differs with respect to games or lotteries 
in that the probability is not assigned using the models, but by using statistical 
data. This is also the case in many other areas (the effectiveness of medications or 
vaccinations, the safety of different methods of transport, etc.). This is why statistics 
is used to study society and is linked closely to probability, allowing us to study 
the probabilities of events. 


Compound experiments 


In terms of probability, choosing two cards from a pack is the same as choosing 
one, and then a second without replacing the first. It is a repetition of our experi- 
ment ‘taking a card from a pack’, although both experiments do not use the same 
pack. When it comes to analysing phenomena of chance, situations involving the 
repetition of the same experience a number of times, or others that consist of 
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referred to as compound experiments. 


When it comes to counting probabilities for experiments that can be broken 
down into simple experiments, a good way of organising the information is as a 


tree diagram. To do so, we need to follow some rules that we shall illustrate by 


considering tossing two coins. 


ie 


First, we identify the trials or observations that make up the experiment and 
establish an order: 1st coin, 2nd coin, 3rd coin... In the case of tossing two 
coins, we establish the order 1st coin, 2nd coin. The order can be justi- 
fied in a number of ways. The experiment does not change if we toss 
one coin first then the other, see that the coins as different, or distin- 
guish between the coin that lands closest to me and that which is fur- 
thest away, etc. 


. We must identify and order the trials, analysing the possible results for the first (the 


possible results of tossing the first coin) and representing them on the tree. The 
initial vertex will give rise to two branches, which represent the events 
‘heads’ and ‘tails’. 


. We can analyse the results of the second experiment, based on each of the possible 


results of the first and represent them in the diagram. Each of the two vertices 
will give rise to another two branches, which will represent the events 
‘heads’ and ‘tails’ for the second coin. 


. If there are more trials for the experiment, we analyse the results of the third, 


based on each of the possible results of the second, and so on, until com- 
pleting all the trials that make up the experiment. 


. The possible results of the experiment will be defined by the various branches of the 


tree. Each branch is composed of various sections (there will be as many 


sections as the number of trials in the experiment). 


Ist coin 2nd coin 
H HH 
H ete 
Yi HT 
H 
T eee: 
HY W I 
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For the case of coins, the tree provides us with all the possible cases, and we can 
count the cases favourable to the event in question (two heads, head and tail, etc.) and 
calculate the probability of each. The tree has four branches or paths, four possible 
cases. And each path represents an equiprobable event. Two of these represent the 
event ‘a head and a tail’ (HT and TH), hence their probability will be: 


We have been able to carry out these calculations because all the events that are 
represented are equiprobable. When phenomena grow more complex, we must as- 
sign different sections of the branches of the tree the probabilities of the events they 
represent and use these to calculate the probability of an event represented by one 
or more branches. Although this is not necessary in our case, we will do so to pro- 
vide an example. The probability of each of the two initial branches of the tree will 
be 1/2, the branches for tossing the second coin will once again be those of an 
unbiased coin: 


1st coin 2nd coin 
1/2 H HH 
H = 
1/2 1/2 ¥ HT 


1/2 T i 
Tr. Lr: 


If we toss two coins a large number of times, with what proportion do we expect 
heads on both coins? Approximately one in four: the probability we must assign 
to the path HH is 1/4. Without a tree, in this simple case it is easy to reach this 
conclusion: approximately half of the tosses will give a head for the first coin, and 
half of these will give a head for the second. Hence, we have two heads in half of 
half of the tosses — a quarter. These fractions are those represented on the branches 
of the path, and since a fraction of a fraction is the product of the fractions: 
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Prob (heads-heads)= +1 = ;: 


Generalising, we can say that the probability of an event represented by a branch or 
path of the tree is obtained by multiplying the probabilities of the branches (edges) of which it 
is made up. 

How many tosses give a head and a tail? We must consider both cases, both 
obtaining a head for the first coin and a tail for the second (HT), and the opposite 
(TH).The case HT, the second branch will occur for half of half the tosses. The case 
TH, the third branch will occur for half of half the tosses. The proportion of the 
occasions that give a head and a tail will be the sum of both: 


Dobie ajo ee eee 
22°22 4/4 2 


As such, the probability of an event represented by various branches or paths of the tree 
is obtained by calculating the probabilities of each of them and adding the results that have 
been obtained. 

In this case, the result of the second experiment was independent of the first: 
regardless of the result of the first toss, in the second the possibilities for heads and 
tails are the same. We say that these are independent experiments. If the experiments 
are independent, the probability of event E, occurring in the first, followed by event 


. E, in the second, etc. is: 


P(E, and E, and ...)=p(E,)* p(E,):... 


Hence, the probability that all the events occur is equal to the product of their individual 
probabilities. 

However, it is often the case that in a compound experiment the results of one 
trial influence the others. These are called dependent experiments. 

Laura must sit exams on 10 different subjects, but has only studied for 8. The 
exam will include 3 subjects chosen at random. What is the probability that she 
will be able to answer the 3 subjects correctly? To continue to the next exercise, it 
is necessary to answer at least 2 correctly. What is the probability she passes? In this 
case it is necessary to observe the results of three trials, corresponding to the first, 
second and third subjects. In each case they may correspond to material she has 
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studied, in which case she answers correctly (C), or material she has not studied, in 
which case she answers incorrectly (J). 

Let us draw a tree diagram to reflect the results of the trials making up the 
experiment. In this case, it doesn’t matter if the subject that comes up is the first 
from the syllabus, or the second or third... All we need to do is observe whether she 
will answer the questions on the subject correctly or incorrectly. Hence, from the 
initial vertex, we draw two branches representing the two possible results for the first 
exam. The probabilities will be 8/10 (C) and 2/10 (J). Let us now assume that Laura 
has answered the first subject correctly (last vertex of the first edge), meaning there 
remain 9 for the second (one has already come up), of which she has studied 7 (one 
of the 8 that she had studied has already come up); hence, the probability for this 
section of the tree will be 7/9.As there are still 2 she has not studied, the probability 
of the other branch will be 2/9. We can analyse what happens if the result of the 
first subject was I in the same way. When it comes to assigning probabilities for the 
branches of the tree, we must bear in mind that the result of one trial influences the 
others. These are dependent experiments. 

Completing our analysis of the second exam, we continue to the third for each of 
the results. For example, for the first branch of the tree (C-C), the third subject can be 
selected from the 8 that remain, of which Laura has only studied 6: the probabilities 
of the two branches will be 6/8 and 2/8, as shown in the diagram: 


6/8 236 ECG 
Cc rat 
7/9 2/8 I CCI 
c : 
2/9 TBs CIC 
8/10 I Ses 
LB od cll 
7/82 1c ICC 
Cc ae 
2/10 8/9 Tite ey ICI 
I 
te: IC 


0 I II 
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We have now completed the diagram and can use it to answer our first question. 
The probability of being able to answer the three subjects given by the top branch 
of the tree (CCC), which is the product of the probabilities of the three sections of 
which it is made up: 


In the case of independent experiments, the probability that the event E, occurs 
first, E, second, E, third... is: 


p(E, and E, and E....) = p(E,)-p(E,/E,) p(E,/E, and E,)-..., 


where p(E,/E,) represents the probability of E., conditional on E,, or rather, the probabil- 
ity that E, occurs, assuming E, has occurred (and similarly for the other probabilities). 

To answer the second question, she passes in four of the eight paths of the tree: 
CCC, CCI, CIC, ICC. The probability will be given by the sum of the probabilities 
of these four paths, each of which is obtained by multiplying the probabilities of the 
sections of which they are made up: 


Se Soap gee pe 5g 708 (93.59). 


It is worth emphasising that this means if there are 10 subjects in an exam, 3 of 
which are chosen randomly and 2 are required to pass, knowing only 8 (i.e. without 
studying 20% of the subjects) the probability of failing is only 6.7%. Perhaps it is 
worth carrying out a probabilistic analysis of exams before starting to study! It may 
represent the most profitable use of our time. Here we are referring to the possi- 
bilities of passing, not learning, and of course there is also the possibility of failing. 


Axiomatic definition of probability 


So far we have considered probability from an intuitive point of view. This has the 
advantage that the arguments are shown as the result of a logical-deductive proc- 
ess that can be easily grasped. However, this approach is not without its drawbacks. 
Above all they are — if the set of elementary events is not finite or if the different 
results are not equally likely — situations that occur frequently. Moreover, the concept 
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of probability causes problems when it comes to its practical application. If it is not 
possible to repeat an experiment an unlimited number of times, how can we know 
if we have carried out enough repetitions to find the correct relative frequency? 
That is a question without an answer. 
At the start of the 20th century, the mathematical formalisation of probability 
- became a necessity for many mathematicians. In the 1930s Kolmogorov stated a series 
of axioms that made an axiomatic formulation of probability possible. This axiomatic 
approach, which satisfied the mathematical community, avoids giving a conceptual 
definition of probability — and hence always having to think about experiments — 
and instead refers to the properties that must be satisfied by the definition of the 
probability of an event. Additionally, the axiomatic definition covers all the properties 
we have discovered intuitively, meaning Kolmogorov succeeded in aligning formal 
mathematics and experiments with random phenomena. Furthermore, Kolmogorov’s 
axioms were essential for the development of another branch of mathematics 
intimately related to probability — measure theory. 

Starting from Kolmogorov’s axioms and following a rigorous mathematical 
development similar to any other branch of mathematics, it is possible to establish a 
set of properties and consequences that are not evident and which could not have 
been reached in any other way. The importance of probability in the study of other 
scientific fields and its large number of applications bring us once again to the 
words of Laplace:“*The most important matters in life are largely nothing more than 
questions of probability.” Perhaps this is why probability is the branch of mathematics 
with the greatest number of applications. 

The axiomatic definition of probability can be stated as follows. Given a sample 
space E associated with a random experiment and the corresponding set of events, F, 
the term probability refers to any way of assigning each event E, in F a numeric value 
prob(E), such that the following properties hold: 


1. For any event E, in F, prob(E,) >0. 

2. The probability of the certain event is one: prob(E) = 1. 

3. Given a collection of incompatible events a E, E,,. a Ey ..in F: 
prob(E, or E, or... or E,...)=prob(E,) + ... + prob(E,) +... 


As we mentioned above, taking these three properties as axioms, by means of a 
process of formal deduction it is possible to arrive at other important properties of 
probability. Let us consider some of these: 
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1. prob(E) =1- prob(E) , where E; is the complementary event of E.. 
2. prob(E,) <1. 

3. prob(©) =0, where © represents the impossible event. 

4. Given any two events E, and E, in F: 


prob(E,UE,)= prob(E,) + prob(E,) —prob(E,NE,). 
In the expression above, the symbol U indicates that one event or another must 


occur and the symbol M that both events occur simultaneously. 


SS Se ee es 
THE UNION-INTERSECTION THEOREM 


The expression that gives the probability of the union of events is generalised as follows: 
If A.A, ....A, are arbitrary events: 


PIA,U...UA,)=p\A,)+PA,)+...+9(A,)-P(A,NA,)-pliA,NA,)—...- 
PIAA) - PIA,NA,)~... «.. PIA, A) +p (A, A, NA,)+...+ 
+p(A,NA,NA,)+...-pA,NA,NA,NA,)-... + 
+1"! p(A,NA,N...9A,). 

Or rather, the individual probabilities are added together, the probabilities of the common 


Parts (intersections) of pairs of events are subtracted, the probabilities of intersections of three 
events are added, and so on, alternating the signs. 


i i a a a a ee le ee ee) 
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Deceptive Situations 


We are going to discuss a series of everyday situations related to probability in 
which the result (which can be found using the knowledge we have acquired in the 
previous chapters) contradicts what appears ‘obvious’. In some cases, we are given the 
probability and must find the situation, whereas in others things are the other way 
around. The good thing is that it is possible to carry out experimental tests (trials) 
to confirm the theoretical results and show that out intuition is not well endowed 
when it comes to probability. The reader should prepare to be surprised. 


Find the situation 


Sometimes, in order to analyse something in depth it is a good idea to consider the 
opposite of what is normal. This is precisely what we shall do here; propose certain 
situations in which the probability is already known in order to investigate the 
composition in which it is based. 


Situation A. An urn contains white and black balls. Two players place a bet in 
which they will take two balls out of the urn. The first bets they are the same colour, 
the second that they are different. They would both like to have the same probability 
of winning. How many balls of each colour must be inside the urn? 


Situation B. John and Belle are playing with two unnumbered dice whose sides 
are painted blue and red. The game is simple — if the two faces facing upwards have 
the same colour, John wins; if they are different, Belle wins. We would like the game 
to be equal, in the sense that both players have the same probability of winning. 
However, one of the dice has five blue faces and one red one. How should the faces 
of the other die be painted? 


Situation C. We have the same number of white and black balls (for example 
10) and two identical urns. The problem involves distributing these balls in the 
urns (neither can be empty) such that if a ball is taken out of one of the urns, the 
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probability of it being white is higher. What is the probability? Will it be greater 
than 1/2 (or 50%) under certain circumstances? 


Solution A. At first glance, it would seem we need the same number of balls of 
each colour. However, this is not the case, as we can see by calculating the probability. 
One possible solution is to have 3 coloured balls (white) and 1 of the other (black). 
We can see that the probabilities are the same: 


The probability that the first is white is 3/4, since the white balls make up three of 
the four; the probability of the second being white after the first was white is 2/3, since 
two of the three white balls are left. If the probability of removing two white balls is 
1/2, that of removing different coloured balls (the only possible alternative) is also 1/2. 


Solution B. The other die must have three faces of each colour, although this 
seems like it cannot be true! We are going to find the probability that the faces are 
the same colour, which will be the sum of them being blue for both dice and them 
both being red. Each of the two events is the product of obtaining this colour on 
both dice. To summarise: 


p (same colour) = p (blue in both)= p (red in both)= 
2523413215, 3 181 


66 66 36 36 36 2 
And the other 1/2 is that they are different colours. 


Solution C, After reading the second question it is easy to think that since the 
number of balls for each colour is the same, the result will be 50% or 1/2. However 
we need to consider whether there is way of getting a better result. In fact, by 
putting one white ball in one of the urns and the remainder in the other, we get: 
p(white) = p(choose urn 1)-p(white in urn 1)+p(choose urn 2)-p(white in urn 2)= 


“i+ =:— = — =0.7368: > 73.68%. 


The increase obtained after a little thought is considerable! 
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Find the probability for a given situation 


Birthday 


Situation A. First birthday problem. There are different ways of formulating this problem, 
but they are all the same. The following applies to parties with many guests. ‘At a 
meeting there are N people who have come together coincidentally. What is the 
probability that at least two of them share a birthday (i.e. they were born on the 
same day of the same month)? Or to put it another way, how many people must 
there be to ensure the probability is 1/2 (or 50%)?’ 


Situation B. Another birthday problem. This time we are not just looking for two 
people who were born on the same day, but for someone whose birthday coincides 
with mine. How many people are needed for the probability to be greater than 50%? 


Solution A. It is worthwhile pondering a little. How many people are required 
to be able to ensure that two birthdays coincide? It is sufficient to have 367 people, 
because each of the first 366 people can celebrate their birthday on a different day 
of the year (including 29 February), but the person who occupies 367th place must 
share a birthday with one of the previous people. Without accounting for leap 
years (as we will do from here on) it suffices to have just 366 people. When will the 
probability be 50%? It seems ‘evident’ to answer half the number of people (i.e. 183), 
but we cannot find an argument to justify this. 

Let us carry out some calculations and apply the definition. Let us assume there 
are 365 days in the year and calculate the probability that there are no coincidences, 
which is simpler. We can then subtract the answer from 1 (or 100 if we are dealing 
with percentages) to find the required probability. 

Let us consider a group of N people. We choose one at random, whose birthday 
may fall on any of the 365 days, then a second, the third and all the others until 
reaching N people. Hence, the number of possible cases that can arise is: 


PC(possible cases) = 365 +365 + 365°... *365=365%. 


We now consider how many of these 365% possible cases do not entail a shared 
birthday (ensuring no birthdays are the same).To do so, we count the cases in which 
a birthday is not repeated. This means there are 365 ways of choosing the date of the 
first person, 364 for the second, 363 for the third... all the way down to the N-th 
person whose birthday can occur on 365—(N—1) days. 
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Hence, the number of possible ways of choosing the birthdays of N people to 
ensure none share a birthday is: 


! 
PC ttavourable cases) 365364" ie (Ges SN) Se 
(365-N)! 


The probability that no pair of birthdays coincide is 


5. 365! 
365 -(365-N)! 
As we are interested in the probability of the complementary event (that there 


are at least two people whose birthdays coincide) its value will be: 


pxi-—__ 2! 
365% -(365-N)! ( 
When the value of [1] is calculated for different values of N the results can be 
surprising. For example, when N=50, p=0.97. In a group of 50 people, there is a 
97% possibility that two share a birthday. For N= 23, p=0.507: in a group of 23 
people, there is already a possibility of over 50% (exactly 50.7%) that at least two 
share a birthday. 
The following table expresses the values of the probability (as decimals and 
percentages) for different numbers of people, applying formula [1]: 


p(N) 
0.81438 
0.89123 


[-oseoss [sat | 


a 
[a | osora_| _so7_| ss] oseeas | see | 


0.5687 56.9 0.99412 
| 30 | 070632 | 706 0.99768 


The table shows that in groups of 60 people, we are ‘almost’ certain to find two 
who share a birthday. (If we try the experiment in 1,000 groups at random, it will 
occur 994 times). However, we must take great care when we use the word ‘almost’, 
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FOOTBALLERS’ BIRTHDAYS 


As there are 22 players and a referee in a football match, the probability of at least two of 
them sharing a birthday is greater than 50%. If we include the two assistant referees (making 
25 people), the probability increases to 56.87%. If we consider the 50 people involved in 
two matches (the semi-finals of a competition), the probability is over 97%. The good thing 
is that this is not just a theoretical musing, but can be verified, All we need is to examine the 
records for the matches of any league. Nowadays, the Internet makes this, if not easy, at least 
feasible. The phenomenon has been confirmed in certain cases. For example, R. Matthews 
and F. Stones showed that of the 10 matches played in the English Premier League on 19 
April 1997, the birthday coincidence occurred in 6, and did not occur in 4. In fact in two of 
the 6 cases, something even rarer occurred — there were two pairs in each! 

Another example is the 16 national football teams that competed in Euro 2008. Each of 
these is made up of 23 players, and eight (half) had pairs of players who shared a birthday: 
Turkey, Switzerland, Germany, Greece, Austria, France, Russia and Sweden. In fact, there 


‘were more coincidences. In four (the last ones in the previous list) there was not one but two 
coinciding pairs. A major coincidence? Is it the case that the players are chosen so that their 
birthdays coincide? In fact, in this respect the football players are not special, it's just that the 
information about them is readily available. The same thing happens with any other group. 


since we can only really be sure for groups of 366 people. What this tells us is that 
if we consistently bet on this probability, we will win with considerable frequency; 
however, it is not advisable to bet all our money on a single case, since there is also 
the possibility of losing. 


Solution B. In this case it is much harder to find a coincidence and we need 


many more people. The probability that one does not share a birthday with an 
other is 364/365, meaning that if there are N people in the room, the probability 


ey 


so the probability that there is someone who shares a birthday is one minus this value: 


that none share a birthday is 
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364 N-1 
p=1-(4] : 
365 


We want p to be 0.5, which is not the case for N=23, as above (for this value 
p=0.058571, less than 6%), although it does occur when N= 254 people (in which 
case p=0.5005), a result that lies closer to our intuition than in the previous case. 
Perhaps we are too egocentric and were unconsciously thinking it was our own 
birthday that was the important factor! 


The drunkard’s walk 


Before the invention of breathalysers, a classical test for detecting drunk drivers was 
making them walk in a straight line. This is something we can do easily under normal 
circumstances, but which becomes problematic if our faculties have been altered by 
the effects of alcohol or other substances — and also if we have problems keeping 
our balance as the result of illness. This is referred to as the ‘drunkard’s walk’. After 
taking a step in one direction, the following step can be taken at random in any 
other direction, even backwards, returning to the starting point; this same pattern 
is repeated for each step. 


Three possible paths of ten steps that may be taken by the drunkard from the lamppost L. 


Let us now consider another problem. If we are walking in a straight line and 
advance by one metre for each step, after N steps we will be N metres from the 
starting point. However, if we walk the drunkard’s walk, how many steps will we 
need to advance the same distance of N metres? Or, put another way — after N 
drunkard’s steps, how far away will we be from the starting point? 

While we consider the problem, it is worth noting that, in addition to its 
entertainment value, it is also useful, for example, for constructing models for the 
diffusion of heat and being able to understand why a room takes a certain time to 
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heat up when the heating is switched on, with the radiator burning hot and the air 
close to it at a reasonable temperature, only for it to be cold a little further away. 
As they grow warmer, the air molecules move quicker, moving randomly, like the 
staggering drunkard. 

Let’s get back to the question. We can construct simple models to answer it. For 
example, tossing a coin to decide whether to move forward or backward. If this is 
the case, on average to move forward N metres, N? steps will be required, whereas 
to move forward 10 metres, 100 steps will be required (100 = 10). Many, as we can 
see, and the lack of proportionality is greater still as the distance increases: some 
2,500 steps are required to travel 50 metres. Now we can understand why it takes 
so long for a radiator to heat a room. Furthermore, in addition to the difficulty of 
moving, we should note that the molecules grow colder as the distance increases. 
We can draw positive conclusions from walking around drunk! 


Other situations 
Cat and Mouse 


This is a somewhat surreal probability game due to the nature of its players. There 
is a cat and a mouse positioned in the boxes marked with their name on the board. 
In line with their instincts, the cat wants to catch the mouse. and the mouse tries 
to escape. However, in this case we are dealing with civilised animals who agree 
to follow the rules of the game. Each of the two will take a step at the same time 
(which in both cases consists of moving to an adjoining square, horizontally or 
vertically, but not diagonally), at random. The cat can move to the right or up; the 
mouse can move down or to the left. If at any point they end up in the same box, 
the cat eats the mouse; if they manage to switch positions without this happening, 
the mouse is saved, What is the probability that the cat will eat the mouse? What is 
the probability that the mouse is saved? 


Once we have ‘deciphered’ the game, we can make the board bigger, with the 
same conditions for the game and the same positions (both starting in opposite 
corners), this time with a 4 x 4 board, then a 5 X 5 board, and so on. 
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Solution. The two probabilities are opposite events, making them complementary, 
meaning they add up to 1:if p (or P%) is the probability that the cat eats the mouse, 
(1—p) (or 100—P%) is the probability it survives. The movements of the cat and 
the mouse can be modelled by tossing two coins (one for the cat and the other for 
the mouse), with one of the possibilities (e.g. heads) corresponding to a horizontal 
movement and the other corresponding to a vertical one. The only meeting points 
are those of the main diagonal of the board (that which does not join the starting 
positions), and both need to land on the square at the same time. 

In the case of a 3X3 board, there are three squares in which the cat and 
the mouse can meet; to reach them they must take two steps, meaning that the 
probability that each reaches the squares at the sides is (1/2) x (1/2) = 1/4; in 
the central square, which can be reached by two paths (left-down and down-left 
for the mouse and similarly but reversed for the cat) the probability is double: 
2/4 =1/2.The probability of meeting in a corner square is (1/4) x (1/4) = 1/16; 
whereas for the centre square it is 2/4 x 2/4 =4/16.The probability for the set of 
the three possible squares where they can meet is 1/16 + 1/16 + 4/16 = 6/16 =3/8 
(or 37.50%). Hence, the probability of the mouse surviving is 10/16 =5/8 (or 
62.50%). The good instincts of civilised beings are likely to come true. 

We leave it up to the reader to calculate the probabilities for boards with more 
squares, but note that as the size increases, the probability that the mouse is saved 
increases significantly. 


Large families 

In Europe it is common for families to have few children, to the extent that it is 
unusual to find one with four. However, previous generations were more prolific and 
birth rates continue to be high in many places throughout the world. It is known 
that the probability of giving birth to a boy or a girl is practically the same. However, 
in families with four children, which is more likely: two boys and two girls, or three 
children of one sex and one of the other? 


Solution. The situation can be simulated by tossing four coins, with heads 
corresponding to one sex and tails to the other.We can then use a tree to represent the 
possibilities of heads or tails for each of the four branches. Of the 16 branches of the 
tree, 8 have 3 children of one sex and 1 of the other (probability 1/2 =50%); 6 have 
two of each sex (probability 3/8 = 37.5%) and the two remaining branches correspond 
to when the four children are all of the same sex (probability 1/8 = 12.5%). If we 
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consider a large number of trials (for example, adding the results of enough friends 
for which this is the case) the results we obtain are very close to this probability. 


Geometric probability 


Assumptions and reality 

Sometimes we assume things happen in a certain way, without giving the matter 
careful thought. However, reality is slippery and does not always conform to our 
thoughts. We are going to see this in the following cases: 


1.We have a 5 x5 board which we want to paint using two colours, red (R) 
and blue (B).The colour of each of the squares will be decided at random. 
How do you think the board look when completed? Fill in the result that 
you are expecting here: 


2. Now for the action; we'll use chance to check the results are similar to 
what has been produced. Take a coin and toss it for each of the squares. If 
the result is heads (H), paint the square R; if it is tails (T), paint it B. Is this 
similar to what you predicted? If you tried the experiment with another 
person, compare their results too. 

3. Now imagine the previous board represents the slabs covering the garden 
of a house and it begins to snow so slowly that it is possible to count the 
snowflakes. After the first 100 have fallen, how do you think they will be 
distributed on the 25 slabs on the patio? 

4. Carry out a simulation of the position of the 100 snowflakes. The process 
is slow because it involves throwing two dice 100 times. The result of the 
first die gives the row, and the result of the second gives the column (in 
both cases we throw the die again if we get a six). Together the two give 
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us the position of the slab onto which the snowflake has fallen. Is the result 
close to what we expected? How many slabs have 0, 1, 2, 3, 4 or more 
snowflakes? Compare the results with a friend. 


Is it what you expected? Is there any similarity between assumptions and reality? 


As a general remark, one of the great difficulties when it comes to studying 
random phenomena is the tendency to believe there are many rules or regularities 
in the results. In example 1, roughly speaking, we tend to see the squares painted half 
and half, and few side-by-side squares painted the same colour. When it comes to 
the snowflakes (example 3), the average is believed to be around four flakes per slab, 
although this is not the case in the simulation (and in reality). On the other hand, if 
the experiment is carried out by many friends and all the results are added together, 
we will see that the greater the number, the closer the results are to this theoretical 
result. (This is given to us by the law of large numbers, which we believe ‘intuitively’ 
but erroneously, also holds for small numbers, with few trials). 


On a sphere 

Choose three arbitrary points on a sphere. What is the probability that the three 
points are located on the same hemisphere (assuming the largest circle around it 
belongs to the hemisphere). 


Solution. Before tackling the problem, it is useful to remind ourselves of the 
geometry of space and the sphere. Given any three points, there is always at least one 
plane that passes through them — which is why a table can be held up by three legs 
not placed in a straight line. If they were placed in a straight line there would be an 
infinite number of planes passing through them. Furthermore, a plane that cuts a 
sphere determines a circle, which is largest (i.e. its radius is equal to the sphere) if the 
plane passes through the centre and smaller where this is not the case. In the latter 
case, the whole circle is in a hemisphere obtained by cutting the sphere through 
a parallel plane passing through the centre. If the plane passing through the three 
points passes through the centre of the sphere, the three points are on the largest 
circle bordering the hemisphere; otherwise they are inside it. However, iri both cases, 
the three points are on the same hemisphere. 

Hence, the unexpected answer is that the three points, which we have arbitrarily 
chosen are always on the same hemisphere. 
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Ancient marriages 


Marriage in Machuria 
In a land called Machuria (not by coincidence, and not located in a specific 
geographic area, but widespread throughout space and time — the land of machismo), 
when a girl wishes to get married, she must ask permission. She goes with her suitor 
to the palace of the ‘chief’ who places six pieces of string of equal lengths into her 
closed hand, their ends protruding from both sides; her suitor must join them (by 
tying knots) in pairs on each side, without the girl opening her hand, such that he 
is unable to see the ends of each length of string; when the six knots have been 
tied, the girl opens her hand: if the string comes out in the shape of a ring they 
can marry; otherwise they must postpone the wedding. 

After a year has passed, they have another opportunity. If the result is also nega- 
tive, they cannot try again and are not able to get married. Can we evaluate how 
difficult it is to get married in Machuria? 


Solution. At first, it seems the ‘chief’ of the territory is not favourably disposed 
towards marriages between his subjects, because the probability of the pieces of string 
coming out in a ring is ‘obviously’ very small. However, looks can be deceiving. 

Let us calculate the probability for the first year, calculating the favourable cases 
(FC) and possible cases (PC).We tie the ends of the strings protruding from one side 


To be able to get married in Machuria, the woman who is to wed must hold the lengths of string 
as shown in the top left image, while her suitor ties the ends together. If the result gives a ring (top 
right), the wedding may take place. Otherwise (bottom photographs) it must be postponed. 
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of her hand randomly, since this does not affect what will happen when we come to 
tie the other side — this is the side upon which we shall fix our attention. The possible 
ways of tying the strings are as follows: for the first knot, we choose one end, which we 
can tie to any of the other five (hence 5); once this has been done we choose another 
end, such that there are only three left to choose from; once we have tied the second 
knot, only two loose ends remain, and hence the only possibility is to tie them together. 
Hence, PC=5-3-1=15. How many cases give rise to a single ring? These are the FC. 

Once we have chosen one of the ends, we can tie the first knot with all the strings, 
except the one that is already joined to it above, hence there are four possibilities; 
for the second knot, we must avoid the string that is already joined with it above 
since, if this is the case, it will create a circle with four sections due to the previous 
knot. Hence there are two cases; in the third only two ends are left, hence just one 
option. This means that FC=4-2-1=8. The probability of getting married in the 
first year is: 


=——=—=0.53,(53%). 
Poets (53%) 


The result is surprising: the first attempt is successful in more than half of the 
cases. 

Let us now consider the case where two years are required. The probability of 
not obtaining the ring in year one is: 


(1—p) = 1—0.53 = 0.47. 


The probability of not obtaining it in either of the two years (we do not obtain 
it on either of the two attempts) is the product of the probabilities: 


(1—p)-(1—p) = 0.47 - 0.47 =0.22. 


This indicates that if we submit to the test two years in a row the probability of 
not obtaining a ring in either of the two years is approximately 22%. Hence, 78% 
of those who try to get married over two years are able to do so. 

There is also another way of considering the probability in the second year. The 
possibility of obtaining a single ring on the second attempt is once again 53%, but 
we should only consider the 47% of the couples that are unlucky in the first year 
(the only ones to repeat the test). Hence, since 0.53 -0.47=0.25, in this case in the 
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second year 25% of couples who have already presented themselves and failed will 
obtain the ring; hence only 47% — 25% = 22% will be unable to marry. 

Whichever way we look at it, and in contrast to what seems ‘evident’, it is not 
that difficult to marry in Machuria! 


Marriage in Remachuria 

Remachuria, a territory close to Machuria, has an even more reactionary chief, who 

wishes to make the conditions for getting married stricter than those in Machuria. 

He will apply the same procedure, but instead of six pieces of string, there are now 

eight. Do you think it will be much harder to get married in Remachuria? 
Following the same logic as for Machuria, the probability for the first year will be: 


IY IS Re, WT EV 
PC 7:5-3-1 105 
In two consecutive years, the probability of not obtaining the knot on both oc- 


casions will be: 
(1—p)*(1—p) = 0.284 (28.4%). 


This is to say that 71.6% can get married. The chief of Remachuria was igno- 
rant and not very clever when it came to probability! 


Other situations 


Winning at tennis 

John and Anna are friends, children of two families who are friends and keen tennis 
players. They want their parents’ permission to go on a trip, but their parents, also 
keen tennis players, do not see eye to eye and decide to settle the matter by means of 
a series of matches. If they win the challenge they can go on the trip. Their mothers 
tell them: “Decide between yourselves who wishes to play against us. The person 
that is chosen will have to play three games against us, switching opponent for each 
game. If you win two games in a row, you can go on the trip”. Of the two mothers, 
Anna’ is a much better player than John’s. Which of the two mothers must the better 
child play against first to have the greatest probability of winning? 


Solution. We can make our decision based on ‘common sense’ (which as we have 
seen when it comes to probability is, like in other areas, the least common of senses) 


87 


DECEPTIVE SITUATIONS 


or appeal to the calculation of probabilities, which does not seem complicated since 
there are only two possible solutions: the first game is either played against John’s 
mother or Anna’s. However, there is the added difficulty that there are no numbers 
for us to refer to (even if just to carry out our calculations and feel like we are doing 
something). 

It would appear that the ‘obvious’ course of action is to play two games against 
John’s mother because she is the weaker player, making it easier to win. Let J be the 
probability of beating John’s mother and A, be the probability of beating Anna's. We 
do not know the values of J or A, but since we know John’s mother is the weaker 
opponent, it is easier to beat her, meaning we can assume J is greater than A (J>A). 
We now draw the trees for the situation, to see the probabilities for each of the two 
sequences (John’s mother—Anna’s mother — John’s mother, or Anna’s mother — John’s 
mother — Anna’s mother) and compare the results. 

Consider the sequence John’s mother — Anna’s mother — John’s mother. They will be 
given permission in the following cases: 


1. Beating John’s mother and beating Anna’s mother (here there is no need 
to play the third match because two consecutive matches have already 
been won).The probability P, of this situation is P, =J-A. 

2. Losing to John’s mother in the first match, but winning the other two. In 
this case, the probability P, will be P,=(1—J)*A-J and the probability P 
of winning will be the sum of P, and P,: 


P=P+P,=J-A+A-J-(1-J=J-A(1+1-)=J-A: 2). 


Now let us consider the sequence Anna’s mother — John’s mother — Anna’s mother. 
The children will win in the following cases: 


1. Beating Anna’s mother and beating John’s mother (the result of the third 
match does not matter). The probability R, of this situation is R,=A J. 

2. Losing to Anna’s mother in the first game, beating John’s mother and winning 
the second game against Anna’s mother. The probability R, will be 
R=(1-A)-J-A. 


In this case, the probability R of winning will be the sum of R, and R,: 
R=R,+R,=A-J+J-A-(1—A) =J-A -(2-A). 
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To establish which case has the greater probability, compare P and R, which 
only differ by the last factor. Since J>A, (2—J) <(2—A), such that R> P, we have 
an apparent contradiction, which goes against what was ‘obvious’: we should play 
twice against the better player (Anna’s mother)! However, we would also have 
arrived at this conclusion by thinking calmly and applying common sense, without 
calculations. In the series of three matches, the most important match to win is the 
second, because it is required to complete the sequence of two winning matches 
in a row, regardless of whether the first or third is won. Hence, it is best to play the 
second match against the weaker opponent. 


A wager: three counters 

We have three counters in an opaque box. One is white on both sides; another has 
a red cross on one side and is white on the other; the third has a cross on both sides. 
Somebody takes one of the counters and places it on the table to reveal one of its 
sides, which is white. The wager involves guessing the other face, without looking. 
On which type of face is it favourable to bet? Or are both the same? 


Solution. At first glance, it appears that the odds of it being white or having 
a cross are equal — the probability is 50%, However, it is more likely that they 
the same colour. Why? (Recall the Monty Hall problem from Chapter 2, which 
is similar.) 

Let us analyse where this white face can come from. It could be one of the 
sides of the coin that is white on both sides, or the white face of the ‘mixed’ coin. 
Of the six faces that can be shown, three are white. Once we have seen a white 
face, the options are reduced to the two counters with at least one white face, and 
of these, two of the three sides we have not seen (the fourth is shown) are white. 
Consequently, the probability that the other face is white is 2/3, and not 1/2 as 
might be expected. (For the same reason, if a cross had appeared, we should have 
betted on another cross.) If we play the game many times, such as on a television 
contest, it is profitable to bet that the other face is the same as the one we can see. 
In the long term we will win two out of three times. 


The coat problem 

A group of N young people go to a winter dance and leave their coats in the 
cloakroom as they enter in return for a numbered ticket. A long power cut occurs, 
and they leave the club in darkness, each grabbing a coat without looking at the 
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number. What is the probability that none of the young people picks up their own 
coat? Does it depend on the number N of people who attend the disco? 


Solution. There are various ways of simulating the problem. It is easy to take 
numbered cards (for example up to 10), shuffle them, place them face down in a 
row and turn them over to check if one of them is in the position corresponding 
to the order. Repeating the experiment a number of times (it is advisable to carry 
it out with other people to save time) gives an approximate idea of the probability. 
If we change the number of cards (for example from 10 to 15), we can check if it 
depends on the number of people. 

If we let A, be the event ‘the person i picks up their own coat’, the union of the 
events A,,A,, A,,..., A, is the event A =somebody picks up their own coat, whereas we 
need to find the probability of the complementary event — nobody does. Hence, 
given p(A), the probability we require is 1—p(A). 

Applying the formulae for the calculation of probabilities gives the following 
general result for N people: 


Re ee ei ere ee 
alba hid ating ie 


Since p(A) depends on the inverses of factorial numbers (which, as we have 
already seen, grow extremely quickly), its value is practically constant as N becomes 
large (remember, for example, that 10! = 3,628,800, which makes 1/10! seem trivial) 
and tends to: 


p(A)= 1-10.63, 
e 
where e is a number defined as a limit: 


& 
e=lim (+4) =2.71. 
Ne N 


To summarise, the probability that nobody picks up their own coat is practically 
independent of the number of people, and is 0.37 (37%). 
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Sticker collections 

Collecting stickers is a time-honoured hobby enjoyed by successive generations of 
children and young people, with only the theme of the stickers changing. There has 
always been the suspicion that manufacturers release a large number of the different 
stickers, except for two or three, which cannot be found, no matter how many packs 
we buy. Leaving this alleged practice to one side, if there are the same number of all 
the stickers in circulation, how many should we expect to have to buy to complete 
the collection? 


Solution. Let us assume we are dealing with a collection of 50 stickers. We can be 
sure that we will not have the first sticker we purchase; for the second the probability 
of it being new drops to 49/50; when we already have two, it will drop to 48/50, 
and so on. If we already have 40 different stickers, the probability of buying a new 
one that we do not have is 10/50 (the 10 we still need to collect out of the set of 
50) = 1/5. Hence, in this case, we must expect to have to buy 5 =50/10 stickers (the 
inverse of 10/50). The same thing happens for the other stickers, meaning intuitively 
we must purchase the following number of stickers in total: 


nee pee ey i Larrea A 
50 49 48 ae oe 50 49 ys | 


It should be noted that the expression above can be justified using probabilistic 
arguments, meaning that in this case, intuitive reasoning agrees with what probability 
tells us. The parenthesis above is what is referred to as a “harmonic number”, and 
a good approximation of its value (which can be checked using a calculator) is 4.5. 
This is to say that without irregular practices in the manufacture of the stickers, we 
would need to purchase 50-4.5=225 stickers to complete the collection. Is this 
our experience? 

In general, if the sum of the parenthesis gives a large value of N instead of 50, its 
sum is 0.58 + InN (where InN is the natural logarithm of N, the value if which can 
be found by pressing the appropriate key on a scientific calculator). 

Fortunately, completing collections facilitates the exchange of repeated stickers 
with friends or acquaintances, whose collections have no reason to coincide with 
ours. It is also possible to evaluate the profit in stickers according to the number of 
people taking part in the exchange. However, this involves a slightly more complex 
reasoning, which lies beyond the scope of this book. 


91 


Chapter 5 


Draws and Lotteries 


Designing fair prize draws and lotteries — those in which all participants have the 
same probability of winning — is slightly more complicated than that it might seem 
at first sight. We might think that this is something that does not affect us unless we 
join in. In fact, throughout our lives, we participate in many draws, in some cases 
without even realising, such as those to decide the members of juries used by the 
courts, Draws are also used, to give just a few examples, in the allocation of school 
places; at times when housing is in short supply, the distribution of social housing; 
deciding the examining boards for competitive examinations (as an official); or the 
place or order of participation in such exams (for candidates). 

There are historical examples of unfair draws and poorly designed lotteries, some 
of which have been probed, and which we will take a look at. In all cases, the bias was 
accidental, with the designers believing the draw was correct, which just goes to show 
that in this field there are also difficulties when it comes to design and execution. 
For this reason, we shall begin by considering different draws on which we shall 
base certain reflections that will ensure we are better equipped to understand them. 

The word ‘lottery’ has its roots in the Italian word lotto, which has two meanings — 
a batch and fate. However, references to lotteries can be found in the Old Testament 
and even in China, where the method was used to finance the construction of the 
Great Wall. 

In Europe, the history of the lottery begins in 1498 in Portugal, when one was 
founded to help the destitute and contribute to the country’s financial requirements. 
One of the oldest lotteries in the world was founded in 1727 in Holland and still 
functions to this day. The country sought to restore public funds to finance its wars 
and pay for public works. 

In Spain the first lottery was established as a monopoly in 1763 during the reign 
of Charles III. The current Spanish National Lottery was born on 25 December 
1811, during the Spanish War of Independence (from France), as “a means of 
increasing the revenue of public funds without bankrupting the contributors” and 
was referred to by the public as the ‘Modern Lottery’ to distinguish it from the 
‘Primitive Lottery’. 
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Children are inseparable from the Spanish lottery. Their innocence, as a guaran- 
tee of impartiality, was the solution to the problem of being able to trust in the 
fairness of the draws. Pupils at San Ildefonso, the oldest school in Madrid, have an- 
nounced the winning numbers (in song) since 1771. 


Draws with few participants 


Draws with biased coins 


Draws using biased coins may not be very common, but they allow us to better 
understand the problems that accompany the design of draws. Imagine we have 
a coin and we wish to use it for a fair or equal draw involving two people. How 
should we proceed? The first possibility is that the coin is unbiased, meaning that the 
probability of obtaining heads or tails is exactly the same, in which case the solution 
is simple — toss for heads or tails. However, it is not possible to know that the coin 
is unbiased in advance, although if we give the matter some thought, we can devise 
a draw that is genuinely fair, regardless of the coin. 

Let us assume that the probability of obtaining heads (H) is p (which is un- 
known, and which would be 0.5 if the coin were unbiased), meaning that the prob- 
ability of tails (7), the complementary event of the one above, will be (1—p). What 
is the probability of obtaining HT (heads-tails) in that order? As the tosses are in- 
dependent (i.e. the result of one does not influence the other) this is given by the 
product of the probabilities: 


prob(HT) =p: (1—p). 
And the probability of getting TH in that order? By the same reasoning above, 
prob( TH) = (1—p)-p=prob(H). 


Hence, we now know how to conduct a fair draw involving two people using 
any type of coin. The solution is to toss the coin twice. One of the players wins 
with TH and the other with HT.The only drawback (minor though it may be) is 
that the tosses giving HH and TT are useless — we cannot take them into account 
and must continue tossing the coin until we obtain HT or TH. Here we should 
note that discarding these cases does not change the fairness of the draw. (We shall 
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consider a similar case with a well-designed draw later in this chapter.) Hence, any 
coin can be used for a fair trial between two people. This does not only apply to 
coins, but also applies to. any other instrument with two possible results, even when 
the probabilities are highly uneven. 


Draws involving three or more people 


Let us now assume that the draw involves three people, one of whom proposes the 
following procedure. We prepare a bag (or urn) with three balls, one of which is 
white and two of which are black ~ it goes without saying that all the balls have the 
same shape and texture to ensure they cannot be distinguished by touch. The three 
people take turns to remove a ball, which is not returned to the bag. The person 
who removes the white ball wins. Is this fair? If it is not, who has the advantage, the 
first, second or third player to extract a ball? Which turn would we prefer to pick 
the ball? 

In the interests of simplicity, let us assume the balls are numbered 0, 1 and 2 and 
that the white ball is number 0. Let us see what happens when we calculate the 
probability of each of the three participants winning. The order in which the three 
poles are extracted will be one of the following six, which correspond to the six 
possible orderings: 


012 021 102 120 201 210 


A simple visual check allows us to conclude that in two of the six casés, ball 0 
will be extracted first, whereas in another two it will be extracted second and in 
the remaining two, third. In other words, the order in which the ball is removed 
does not matter. 

Using Laplace's rule, it is also clear that the probability of removing the black ball 
first is 2/6 (two favourable cases, which are 012 and 021, out of the six possible cases 
that correspond to any of the previous orderings), and the probability of extracting 
it in second or third place is also 2/6. 

Let us see what happens when we argue in terms of conditional probability. Given 
that there are three balls, the probability that the first wins is prob(1st) = 1/3. For the 
second to win, the first must have extracted a black ball, and the second the white. 
In other words, the second must be able to extract a ball (the probability that this 
occurs is 2/3, the conditional probability that the first has not won) and must also 
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choose the white ball (when there are only one black and one white ball remaining). 
Hence, prob(2nd) =2/3° 1/2 = 1/3. Finally, prob(3rd) = 1-1/3—1/3= 1/3. If, instead 
of three balls, there were 100 or N participants, we would use 100 (or N) balls and 
a similar argument that would lead us to the conclusion that the probability of 
removing the winning ball at a given position is 1/100 (or 1/N).A similar situation 
occurs if a teacher wishes to rafHe a prize among N students and writes a number 
between 1 and N on a piece of paper before asking each student in ‘alphabetical 
order’ to give them a number between 1 and N, awarding the prize to the student 
who guesses the number that has been written down. Once again, the order (in this 
case alphabetical) does not matter. 


Draws with many participants 


In order to analyse the difficulties of draws in which there are many participants, we 
are going to consider two historical events, both related to the military. In both cases 
the draws were organised with the intention of being fair, so that each participant 
would have the same probability of being chosen or excluded. 

The first example comes from the United States in 1970, at the height of the 
Vietnam War, in a climate that was unfavourable to fulfilling the large demand for 
soldiers. A military draw was organised to choose the young conscripts who would 
be called up for the war. It was in the participants’ interest to avoid selection. The 
draw was organised by putting each of the possible 366 days in the year (from 1 
January to 31 December, including 29 February) in a drum and removing them at 
random. The first date to come up was 14 September, followed by 24 April. These 
dates were then used to recruit soldiers born between 1941 and 1952, following 
the order of the dates that came up in the draw (first, those born on 14 September, 
then those born on 24 April, etc.). What do you think of this procedure? Is it fair? 

Let us now fast forward to Spain in 1997, where military service is obligatory 
for all men of a certain age. A total of 165,342 people were called up, exceeding the 
army’s requirements and meaning there were 16,442 people who would not need 
to provide military service, the so-called “excess quota’. In this case, being chosen in 
the draw was desirable. The draw was organised as follows. A random number was 
first assigned to each of the 165,342 participants. Six drums were used for the draw, 
each of which had balls corresponding to a digit of the number to be selected. The 
system would be used to release 16,442 of the conscripts, whose numbers were 
determined sequentially, starting from the digits drawn (if they reached the end, they 
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would start again from 1 until completing the required quantity). In the first drum 
there were five balls with a 1 and five balls with a 0 (corresponding to the hundreds 
of thousands); in all the other drums there were 10 balls numbered from 0 to 9. If 
the second drum (tens of thousands) gave a number greater than 6 (which actually 
happened in practice) the procedure would be repeated for that drum. What do you 
think of this draw? Was it fair? 

Before we go on to reveal the rest of the details of the two military draws, let 
us consider another draw that was used for a period of time to allocate positions in 
institutions when demand outstripped supply, such as in some schools. A number 
was allocated based on the order of pre-enrolment, and one of the numbers was 
drawn. If, for example, there were 50 places, they were allocated to the holder of the 
first number and the 49 that followed. What is wrong with this draw? Imagine that 
I wish to enrol. If I convince 30 friends to sign up before me and the numbers for 
the draw are allocated by order of enrolment, I would have an advantage because I 
know all my friends will give up their place. In this case, I am not only playing with 
my number, but also with the 30 numbers before me that belonged to my friends. 
You may think this is a sophisticated way of increasing the probability but it was 
actually used for a considerable period of time, such that the procedure has now been 
changed and numbers are no longer allocated based on sequential pre-enrolment. 

The system for allocating social housing has also been changed for the same 
reason, since the draw was carried out in the same way. However, in this case the 
inequality it is less relevant because signing up with the intention of turning down 
a place carries a risk of being blacklisted in future. 

Let's now return to the Vietnam War draw. In the results of the draw, the conscripts 
born in the later months of the year outnumbered those born in the other months. 
Given that the date of birth was selected randomly (although this could also constitute 
another debate), a similarly random distribution of dates was to be expected. What 
went wrong? The capsules with the dates of birth were introduced into the drum 
beginning with 1 January, and so on, in strict date order until reaching 31 December. 
However, they were removed without having been mixed properly, meaning the 
proportion of those towards the end of the year, which were on top, was much 
higher than expected. How could this have been done in a simpler and fairer way? 
For example, taking just one date at random, after correctly mixing the drum, and 
choosing the required men born from this date onwards. Quicker and fairer. 

Let us now consider the Spanish case. The fact that for the first drum the prob- 
ability of obtaining a 0 or 1 was the same, since there were five balls in each, meant 
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that the probability of removing a number (and, consequently, the person who 
would be declared in excess of the quota) depended on their number. Hence, for- 
getting the issue of the possible repetition of the number obtained from the second 
drum, the probability of the numbers between | and 99,999 coming up is: 


1 
$)=————=0.00000765. 
p(S) z 


Hence, the numbers from this latter group have a much higher probability of 
being chosen than those in the former, around 50% more, The number that came 
up in the draw was 155,611. In the case of the first 5, it was necessary to repeat the 
procedure because an 8 came up first (greater than 6). The conscripts with this number 
onwards were exempted until reaching the last number, then continuing on from 1, 
until reaching 16,442. 

Let us consider one final point before moving on from the Spanish draw. When 
faced with questions from journalists about probabilities and other issues, the 
politician responsible for its organisation, the current undersecretary of defence, 
was not ashamed to answer, “My background is in the humanities and | am not an 
expert on probabilistic arguments”. The draw was never repeated, because, according 
to information provided by the army,*The equal opportunities of all those involved 
had been preserved since the allocation of numbers to each had been carried out 
randomly”. This last statement is correct, since to calculate the real probability that 
one of the conscript’s numbers would come up in the trial, the previous probabilities 
would need to be multiplied by the probability of assigning a number between 1 and 
99,999, or between 100,000 and 165,343.This example shows that many people find 
probability difficult to grasp. In spite of the ignorance shown by those responsible 
for the draw, it was correct, although some of the explanations provided are best 
forgotten. In the following section, we shall see how to conduct a fair draw under 
such circumstances in a different context. 

For a similar reason, deciding the members of a panel, jury, committee or similar 
body by taking the first two letters of the candidates’ surnames is not fair. In this case, 
not all surnames are equally probable, and someone whose surname is ‘Brunswick’, for 
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example, will almost never be chosen, since the only chance of being chosen is that 
‘BR’ comes up, and common names such as ‘Brown’ are more likely to be selected. 


An official well-designed draw 


The Government of Aragon asked the Department of Statistical Methods at the 
University of Saragossa to design a draw for the allocation of subsidised housing 
to ensure the procedure was fair for all participants. We shall consider a part of the 
study relating to draws with large numbers of participants. 

When housing developments are ready for allocation they attract a large number 
of applicants. Using a draw with all participants can be a complicated task, for various 
reasons, not least because of the participants’ right to check that their numbers were 
actually entered. For this reason, in such cases it is recommended that the draw is 
carried out using a procedure with multiple drums, one drum per digit. Each drum 
must contain 10 balls, numbered from 0 to 9. From now on, in order to simplify our 
explanation, let us assume there are more than 10,000 participants, or rather we need 
five drums, the first of which corresponds to the units column; the second to tens; the 
third to hundreds; the fourth to thousands; and the fifth to tens of thousands. If the 
number of participants is greater than 1,000 and, as a maximum, equal to 10,000, the 
method detailed will be valid, but in this case there will be four drums, meaning the 
procedure for the fifth drum will now apply to the fourth. Similarly, if the draw involves 
more than 100 people and a maximum of 1,000, only three drums will be needed, and 
the procedure for the fifth drum will be applied to the third. Departing from the above 
procedure, for draws of 100 people or less, it is recommended to use a single drum. 
However, if a decision is taken to use multiple drums, only two are required, with the 
tens drum following the procedure for the fifth drum in our example. 

It should also be noted that, when it comes to assigning numbers, in principle 
zero must be regarded as a normal number, such that if we are considering 10,000 
numbers, the values go from 0 to 9,999, in a similar way to how lottery draws 
operate. In many cases, for various reasons, it is common not to assign 0 to any of 
the participants, leading to a slight modification in the procedure, which does not 
affect the fairness of the draw and which will be described at the end. It is necessary 
to make this modification since zero is just as likely to appear as any other number 
when using the multiple-drum procedure’. 


1 0 is considered by some to be an ‘ugly’ number which is why it is not assigned by the organiser of the draw. 
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We'll now consider the possible situations. Remember we are dealing with a 
draw involving five drums (there are more than 10,000 participants in the draw). 
Let’s assume, for example, that the number of participants is a multiple of 10,000, 
such as 30,000, meaning that the participants have numbers between 0 and 29,999, 
This means that in this case only the numbers from 0 to 2 should be included in 
the fifth drum. Hence, to decide the winning number, it is necessary to extract a 
number from each drum to give one of the participating numbers. 

Let us now consider the case in which the number of participants is not a multiple 
of 10,000, such as 53,427. In this case, the numbers from 0 to 5 should be included 
in the drum corresponding to the tens of thousands, and we should proceed as in the 
case above, extracting a ball from each drum to give a number between 0) and 59,999. 
However, this time, if the resulting number corresponds to a number that does not 
exist, all the balls that have been extracted should be put back into the corresponding 
drums, as if no balls had been extracted, and they should be extracted from the drums 
again to give a new number. The procedure is repeated as many times as necessary 
until a valid number comes up’. 

In order to see that the procedure is fair for all participants of the draw, we shall 
consider a specific example to aid our explanation, without assigning the number 
zero to anybody. Let us assume that the participants are numbered from 1 to 53,427; 
the procedure described consists of using five drums (units, tens, hundreds, thousands 
and tens of thousands), placing the balls numbered from 0 to 9 into the first four 
drums, and balls numbered from 0 to 5 into the fifth. One ball is removed from each 
of the drums to give a number: if this is between 1 and 53,427, this is the ‘winning’ 
number; otherwise, if the number that comes up is 0 (00000) or is greater than 
53,427, all the balls are returned to their corresponding drums and another five are 
removed, repeating the process until obtaining a number between 1 and 53,427. 

To check this procedure is fair, we must ensure all the numbers between 1 and 
53,427 have the same probability (1/53,427) of being ‘winning’ numbers. Let us 
consider a specific number, for example, 12,525, and calculate the probability that 
it is the winning number. 

First of all, note that extracting one ball from each of the drums, all numbers 
between 00000 and 59,999 have the probability 


2 It would also be correct to place balls from 0 to 9 in the fifth drum, proceeding in the manner explained 
above and discarding all the attempts that correspond to nonexistent numbers. However, this way of operat- 
ing is often tedious, since many attempts may be required due to the quantity of nonexistent numbers. 
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of coming up, since the probability of removing a specific ball from each drum is 
one divided by the number of balls. Hence, the probability of obtaining the number 


12,525 on the first occurrence is 
1 


60,000" 


However, it may be the case that the first attempt results in an invalid number and 
the procedure must be repeated, meaning that once again the number 12,525 may 
come up. For this to be the case, the first attempt must be invalid, meaning that 
either 00000 or a number above 53,427 must come up. In total, there are 6,572 
such numbers (the numbers between 53,428 and 59,999 plus 00000) that are invalid, 
meaning that the probability of an invalid number coming up on the first attempt is: 


invalid numbers as 6,573 
total numbers 60,000 


Hence, the probability of obtaining the number 12,525 on the second attempt 
is the probability that the first attempt is invalid, 


6,573 
60,000° 


multiplied by the probability that the number 12,525 comes up on the second at- 
. Hence the probability remains: 


tempt, which is 


6,573 
(60,000)? 


Repeating the calculation for the case in which the first n attempts are invalid, 
and the number obtained on the (n+ 1) attempt is 12,525, and adding up all the 
probabilities, we obtain the end result that the probability of 12,525 being the 
winning number is: 
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60,000 60,000 60,000 60,000 | 60,000 
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60,000 , 6,573 60,000 53,427 53,427” 


60,000 


hence proving the procedure is fair. 


Lotteries and mathematical expectation 


In our society there is always resistance when it comes to paying taxes, except one 
that is paid voluntarily and with pleasure — an official lottery. This is because, with- 
out a doubt, the state is always the winner. This has led to many dogmatic words 
being written on the subject, such as the English writer Henry Fielding’s (1707= 
1754) ode: “A Lottery is a Taxation, // Upon all the Fools in Creation; // And 
Heav'n be prais’d, // It is easily rais'd, // Credulity’s always in Fashion; // For, 
Folly’s a Fund, // Will never lose Ground; // While Fools are so rife in the Nation.” 


BADLY DESIGNED LOTTERIES 


The state (or organiser) always wins except in the case of badly designed lotteries. There are 
various examples of such lotteries, such as the monthly lottery organised by the city of Paris 
in 1728 to raise funds to tackle its debts. 

The famous thinker Voltaire (1694-1778) once said:"Chance is a word that is devoid of 
meaning, nothing can exist without a cause”, professing the deterministic faith of the 
Enlightenment. Nevertheless, with the help of a friend who was a mathematician, he realised 
that he was in a dream situation. He would be sure to win the lottery if he bought all the 
numbers available. And this is what he did, forming a group of friends to do help him, and 
thus contributing to his financial well-being and in turn allowed him to continue creating his 
works of literature as well as his various business activities. 
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The analysis of games of chance is undoubtedly of mathematical interest, but it is 
also of interest from the perspective of society. If we think about the large number of 
people and the amounts of money involved, it is an important issue in all countries, 
with the statistics showing that Spain in particular is a global power when it comes 
to games. The Spanish Christmas Lottery has the largest prize fund in the world, 
and this is how is it presented in the international media, with a message (at least 
implicit) that this means the probability of winning a prize is higher. 

Given that in many games like the lottery, there is the chance of winning a 
range of prizes, an important concept is the mathematical expectation of earnings, 
which provides us with information about the average value we may obtain. 

It is calculated by determining the probabilities of each prize using the classical 
method:The quotient of the number of cases in which a prize is won and the total 
number of possible cases. For example, in a draw with 100 tickets or numbers, one 
jackpot and two second prizes, the probability of winning the jackpot with one 
ticket is 1/100=0.01 (or 1%), whereas the probability of winning second prize is 
2/100=0.02 (2%); the probability of not winning a prize is 97/100=0.97 (97%) 

The mathematical expectation for the earnings of a game is the sum of the 
products of the possible winnings and the probabilities of obtaining them, 
minus the cost of participating. In the previous raffle, if each ticket costs £5, 


THE BUSINESS OF CHANCE 


The sums of money bet by various players on various different games of chance (including 
lotteries, casinos, bingo and other amusements) can represent percentages as high as 3% 
of GDP for certain Western countries. This level of expenditure stays constant in times of 
economic hardship, as has been shown in recent years. When it comes to other social aspects 
of draws with large prizes, we are not only dealing with the amount of money that can be 
won, but with a whole range of preconceptions and prejudices — the luck of the player, the 
strange ‘hunches’ for certain numbers, premonitions and the places where the tickets are 
purchased. (For example it is sometimes regarded as better if a disaster has occurred there, 
since, by some sort of divine or extra-terrestrial compensation, the probability of winning 
is greater). Consider the example of a person who won the jackpot with a number ending 
in 48, who explained why they had chosen it: “| dreamt of a seven, seven nights in a row, 
and since seven times seven gives 48, | bought number 48!” How could we convince this 


visionary that 7-7=49! | 
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the first prize is £100, and the two second prizes are £40, the expectation is 
100-0.01 +40-0.02—5=1+0.8-5 =—£3.2. If the expectation is a negative number 
(as is the case here), the game is unfavourable to the player; if it is positive (which does 
not often happen, except in the event of'a flawed design) it is favourable; and if it is 


zero it is a balanced or fair game for both parties. 


The Spanish Christmas lottery 


In Spain, the Christmas Lottery is largely played as a tradition. It is practically a 
social obligation together with the traditional nougat sweets, presents and family 
celebrations, and is often played at work, with friends or family, in spite of the fact 
that they are actually making a voluntary tax contribution. The prizes for the 85,000 
Christmas numbers, which range from 00000 to 84,999, are listed on the ticket. The 
probability of winning the jackpot with a single number is 1/85,000 = 0.00001176, 
the same as winning the second or third. The probabilities of winning one of the 
two fourth prizes or the eight fifth prizes are 2/85,000 and 8/85,000, respectively; 
the probability of winning one of the consolation prizes with 1,774 numbers is 
1,774 times greater than the winning the jackpot. Adding together all the prizes, 
gives a total of 13,334 (with first-threes, endings and other possibilities), meaning 
that the probability that a player receives a prize with a single tenth part of a ticket is 
13,334/85,000 = 0.156870588, almost 1/6, or rather just under 16%. Stated in terms 


B 5102908092>0616463744 Hf 


Ticket for the Spanish Christmas Lottery draw on 22 December 2009. 
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of the players, if a person regularly plays 25 different numbers every year, they will 
receive four annual prizes, half of which will just be reimbursements. 

Multiplying the prizes of the lottery by the probability of obtaining them and 
adding them up shows the mathematical expectation of the earnings for a tenth of 
a ticket (which costs €20) is -€6 (30% of the price of the ticket); in other words, 
the game is unfavourable to the players and, of course, profitable to the state. 


A CHRISTMAS CAROL 


John has a friend, whose gender he does not tell us and who lives in Ciudad Real. He has not 
seen this friend for a long time (and does not know their current address). The last time they 
met, both promised to send each other a book they liked. A few days ago, he met another 
friend, Lenny, who told him he was travelling to Ciudad Real. Haunted by the ghosts of his 
unfulfilled promise, John gives Lenny the book for his friend. To avoid damaging it, Lenny 
places it in a large envelope and sets off for Ciudad Real. When he arrives, he parks his car 
in a street in which he finally finds a space, gets out and goes up to the first person he sees 
and hands them the envelope: "Take this book, it's from your friend John in Saragossa”. And 
the person, discovered by chance, replies “That's great he remembered, I've been waiting 
for it for years!" 

As much as we are reassured that the story is true, does it not seem incredible that this is 
just the person he was looking for? Of course it does! The probability of this happening is 
marginally more than any of us having purchased a winning lottery ticket for the Christmas 
Lottery. indeed, the population of Ciudad Real is 74,213, according to the municipal electoral 
register as of 1 January 2009, and there were 85,000 Christmas Lottery tickets in 2009. 


The winner always comes from a city 


The fact that the Christmas Lottery is so popular and that so many people take part 
in the draw has given rise to strategies that attempt to improve the extremely small 
probability of winning a significant prize. 

An enthusiast of games of chance who lives in a small town will always ask a 
family member from a larger city to buy their tickets for them:“Can you buy them 
because the winner always comes ftom a city?” Does this improve the possibilities 
of winning? Obviously not, but it is a widely held belief that makes many people 
purchase their lottery tickets in a famous city. If many people purchase lottery tickets 
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in a specific place, this will increase the probability that the winner is located there, 
since more numbers will be sold. However it will not increase the probability that 
a specific number purchased there will win. 

The optimism can be even greater, as in the case of a person who, in order to 
maximise their chances of winning, reasons: “I buy my number for the Christmas 
Lottery when the tickets are put on sale, that way I know they still haven’t sold the 
winning ticket.” This calls to mind the situation whereby a ball is taken out of a 
bag with many balls, only one of which wins a prize: the order in which they are 
taken out does not matter. 

Finally, imagine that if, even if the argument with which we began this section is 
true — it is easier to win in the city because more numbers are bought there — there 
is a procedure that makes it easier for us to win the Christmas Lottery. Unfortunately, 
this is obviously not the case. 


Primitive lotteries 


Lotteries referred to as primitive, or lottos, which involve choosing m numbers 
between 1 and N, are extremely popular in many countries. In the United Kingdom 
the Lotto involves drawing 6/59; in Switzerland it is 6/42; in New Zealand it is 
6/40, and in Sweden there are two possibilities, the 7/35 and the 6/48. In Spain 
there are three: the Primitiva Lottery and the BonoLotto 6/49, and the ONCE 
lotto 1/100,000. 

In all these cases, with slight variations (the reasons for which we shall analyse 
later), there is a prize for guessing all or some of the lucky numbers. In the case 
ofthe Spanish Primitiva Lottery, it is necessary to guess 6,5 + another ball drawn 
separately (the bonus number), 5, 4 or 3. 

Let us focus our analysis on the latter. To win the first prize, players need to guess 
6 numbers drawn between 1 and 49.The total number of different ways of choosing 
6 numbers from the 49 possibilities (given that the order in which they are chosen 
does not matter) is, following the reasoning from chapter 1: 


49 +-48-47-46.45- 
bes =P TIM 35516 


Almost 14 million! This means that the probability of guessing the 6 numbers 
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with a single ticket is a small number indeed: 


1 


=—— = 0.000007%. 
13,983,816 


P 
With respect to the possibility of winning the other prizes, the values for the 
individual cases are 


_ possible cases 
13,983,816 © 


making it necessary to calculate the possible cases for each number of correct guesses. 

It is possible we do not get any one of the six numbers correct, and with each 
‘failure’ any of the other 49-6=43 numbers can be chosen, meaning there are 
6:43=258 possibilities for getting 5 numbers (although this value includes the 6 cases 
where the bonus number is guessed correctly) and those in which it is not. Using 
a similar method, there are C;-Cis= 13,545. favourable cases for getting 4 numbers 
right. For 3 numbers, the number is (}: @}, = 246,820. Hence the probabilities of 
winning are: 


Picking the lottery numbers. 
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Bet: p=6/13,983,816 = 1/2,330,636. 
5 p=252/13,983,816=1/55,491,. 
4 p=13,545/13,983,816=1/1,032. 
3 p=246,820/13,983,816= 1/57. 


Adding all these together, the probability of winning a prize is 1.86%. As we can 
see, these are extremely small quantities. Playing regularly, in all 52 weeks in the 
year, it would be easy to get 3 numbers right once a year. The remainder of the 


| HOW MANY NUMBERS? 


Why is the Spanish lottery 6/49 and the Swiss one 6/45? The answer is related to the number 
of inhabitants of the country where the lottery will take place, in order to evaluate the possible 
number of players, and, hence, the possibility that they win significant prizes. This is perhaps 
better seen using an example. In Catalonia, with 7 million inhabitants, it is possible to play 
the Lotto 6/49, but since the possible number of possible bets is almost 14 million, it is often 
the case that there are no winners with 6 numbers or even 5 plus the bonus ball. Indeed, of 
the ten consecutive draws that took place between 19 September and 21 October in 2009, 
eight of them produced no winner for 6 numbers or 5+bonus. Only on the draw dated 7 
October was there a winner in each category: The 6 numbers won €2,453,000, and the 
5+bonus, €50,643. On the 14 October draw, there was no winner with 6 numbers and just 
one winner with 5+B, who won €11,600. A game that bases its powers of attraction on 
extremely large prizes thus loses part of its appeal. On the other hand, this lottery is suitable 
for Spain as a whole since as the number of potential participants increases, so does the 
average number of winners. : 

From 2004, the lottos of individual countries have had to compete with an even more 
complicated game, the Euromillions, which takes place jointly in various countries throughout 
Europe. Players have to choose a group of five numbers from 1 to 50 and two stars numbered 
from 1 to 9. Hence, the number of different bets that can be placed, following a similar 
argument to that used above, is: 


50) (9) 50.49.48-47-46 9. 
C3.c2 =] 0]. {9 | 50:49:48-47-46 9-8 _» 118 760.36 =76,275,360. 
Pa os 5! 2! 


There are more than five times the number of possible bets than in the national lottery, 
making the probability of winning five times lower. However, that said, the possible prizes 
are huge. 
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probabilities are so small that we cannot expect greater prizes with any meaningful 
frequency, and would even have to wait a number of years to get 4 numbers right. 


The attraction of lotteries and ‘Pascal's Wager’ 


Why are lottos so popular? Unconsciously, we argue along the lines of Pascal’s 
Wager. The mathematical expectation is the same as in many other lotteries, since it 
depends on the percentage of the revenue used for the prizes. However, the prizes are 
distributed differently and can be much, much larger. The probability of winning a 
big prize is extremely low, but when this occurs we become seriously rich overnight. 
The social decision is that it is worth the risk: that’s why we play! 
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The Advantages of Being 


‘Normal’ 


Large numbers 


If we throw a die and ask someone what the probability of getting a four is, it is 
highly likely they will answer 1/6, despite the fact they may not have a very clear 
idea of the meaning of probability (except if they have read the previous chapters of 
this book). For this reason, it may be interesting to ask them what they understand 
when we say the probability is ‘one sixth’. Even if they are not able to answer the 
first question and we tell them that the probability is 1/6, the second question still 
makes sense. 

Answers to the second question may vary considerably, depending on the 
knowledge of the person answering. We frequently come up against the following 


three answers: 


1. The die has six faces, any of which can come up. 

2. If we throw the die many times, approximately one in six of these will 
give a four. 

3. If we bet in a game where we win or lose depending on whether the 
number four comes up, the bets should be 5 to 1 in favour of getting a four. 


The first of these answers may be in line with our intuition and does not 
require special mathematical knowledge. On the other hand, the third answer is 
easily understood by people who are used to betting on various types of games and 
responds more to their experiences as players than their knowledge of probability. 
The variety of games is so large and the interest in them so extensive that many 
people will fervently defend this answer as being the most reasonable. 

However, without great difficulty, it is possible to conclude that for a large 
number of people with limited or no knowledge of probability, the second answer 
has the smallest number of adherents. However, this is precisely the answer that can 


be mathematically proven! 
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A few moments of reflection will allow us to conclude that this statement 
is more complicated than what might at first be thought, involving subtle 
mathematical concepts. In fact, anyone can argue, in absolute terms, that the 
statement does not hold no matter how many times we throw the die if we 
throw it a number of times that is not a multiple of six.What then, is the second 
statement really saying? 

It says that if we throw the die indefinitely, or at least a very, very, very large 
number of times, the proportion of times in which a four comes up (or any other 
of the six possible numbers) will grow as close to 1/6 as we wish. In probability 
theory, these types of results are referred to as laws of large numbers (now it is clearer 
where the name comes from). Let us also recall the idea of statistical regularity from 
Chapter 4. If we are still sceptical about the truth of these laws, let us consider another 
example that will help dispel our doubts. 

Roulette is one of the best-known games of chance, both inside and outside 
casinos. It is possibly one of the most widely played games throughout the world 
on a daily basis, and the prizes can reach significant sums. 

Essentially, European roulette is a cylinder with a rotating disc inside, divided into 
37 compartments, alternating between red and black, and numbered from 1 to 36, in 
addition to a 0, which is in a different coloured compartment (green, for example). 
It is necessary to maintain a delicate and fine balance between all the positions 
in which each number is located. The aim of the game is to guess the number or 
colour of the rotating disc onto which the ball is thrown by the croupier. There are 
variations on this model of roulette, such as American roulette, which also has the 
‘double zero’ and some slight variations on the betting and prizes. Here we shall 
discuss European roulette. 

First the bets are placed, then the wheel is spun vigorously and the ball is thrown 
by the croupier onto the outer part of the wheel, where it remains, spinning round. 
When the wheel slows down sufficiently, the ball falls into one of the compartments, 
and bounces around between numbers until finally coming to a rest in one of the 
37 compartments. 

The ‘magic’ of the wheel’s motion has captured the attention of humankind 
almost from its outset. The apparent stillness of the centre, together with the increase 
in speed as we move away from it and the uncertainty regarding the point on which 
the ball will come to rest has lain behind many wheel-based games, such as roulette. 

According to the available information, the creation of a roulette game and its 
rules, highly similar to the ones we use today, dates back to Pascal, who designed 
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a game with 36 numbers (without a zero). It would appear that the choice of 36 
numbers establishes an even closer relationship with magic, since the sum of the first 36 
numbers gives that most magical number: 1+2+3+4+...+33+34+35+36=666. 
The choice of the number 36 also means it has many divisors. 

Although there are many ways of betting on roulette, to simplify our explanation 
let us assume that a player only plays even/odd, red/black or passe/manque (passe is 
a bet on one of the higher numbers, 19-36, and manque refers to the lower numbers 
1-18). For each pound that is bet, the player receives the pound in addition to 
another if they win. What is the probability of winning this additional pound and 
what is the probability of losing the pound that is bet? 

It is important to point out that the number zero does not have a colour and 
nor is it considered, for the purposes of the game, to be odd or even, or included in 
the passe/manque bet. The probability of winning any of these bets is 18/37, and 
the probability of losing is 19/37.The casino has a probability of 1/37 in its favour 
(2.7%). Given that we will win an additional pound with a probability of 18/37 
and lose one pound with a probability of 19/37, the mathematical expectation or 
average earning for each bet is: 


18 19 _ (-1) 
dire (tee ee DOF 
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meaning that in each bet we lose an average of 2.7 pence. 

This situation is the same if we bet on a single number, since, in this case, the 
probability of winning is 1/37 and that of losing is 36/37, and if we bet one pound 
and win, we will receive 1 + 35 pounds. The average earning is also: 


Liesl 
ay Soe WSF 


=£-0,027. 

Hence, we continue with our simple bets on red or black. 

Let us now imagine that we play 100 games, betting one pound on red or black 
for each. At the end ofall the games, will we have lost 100+ 0,027 = £2.7 ? All we can 
say for sure is that we will have neither lost nor won amounts that are not integers. 
However, will we be closer to having lost this sum or closer to having won, for 
example, £50? These issues are similar to those that arise if a four appears one sixth 
of the times we throw a die. 
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We can think of other situations with similar questions. When it is claimed that a 
drug is 80% effective, does this mean that if it is given to a large number of patients, 
it will only work for 80% of them? If the drug produces side-effects in 1% of the 
cases, does this mean that they can affect 1% of the patients treated using the drug? 
The answer to these questions is given by the laws of large numbers, to which we 
shall now turn our attention. 

Let us return to our questions on roulette, dice or any other situation in which 
probabilities can be assigned to the occurrence of certain events or results. There are 
two types of questions related to the idea of probability. On the one hand, we can 
ask: Will the probabilities of the events be reflected in the results we obtain, and how might 
this occur? On the other hand, we can propose the opposite problem: Is it possible to 
infer the associated probabilities by examining the results we have obtained? 

The questions are of a different nature, since they use inverse forms of reasoning, 
In the first case we wish to use the probabilities that are assigned to deduce the results 
we will obtain, whereas in the second we wish to use our observations to infer the 
probabilities that govern the phenomenon. The latter is the realm of statistics, and 
we shall return to it later. For now, though, let us focus on the first problem and 
attempt to answer the questions we asked at the start of the chapter. 


Bernoulli’s golden theorem 


Let us begin by throwing a die many times. After 300 throws, will we obtain 50 
fours? What happens if we throw it 3,000 times? The reader may think that no 
one in their right mind would take the time to throw a die thousands of times and 
write down the results. And if they are told that a mathematician has done this, it 
might just reinforce the popular idea that mathematicians are often a little bit mad. 
However, the only way of verifying the results predicted by the theorem is by car- 
rying out experiments, which are also a part of mathematics, bringing it closer to 
the experimental sciences than is often believed to be the case. This is precisely what 
the French naturalist Georges Louis Leclerc or the Comte de Buffon (1707-1788) 
did, tossing a coin 4,040 times and obtaining 2,048 heads; or rather a proportion of 
2,048/4,040 = 0.5069 (50.69% expressed as a percentage) were heads. 

Certain special circumstances may also arise in which there are not many 
interesting things to do and it is useful to find tasks to keep our minds active 
and preserve our emotional balance. This was the case with the South African 
mathematician John Kerrich, who was taken prisoner during the World War II and 
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who ‘entertained’ himself in prison by tossing a coin 10,000 times, obtaining 5,067 
heads; a proportion of 5,067/10,000 = 0.5067 (50.67%). At various points throughout 
the process, the percentage of heads deviated from the expected 50%, but as the 
number of tosses increased, it grew closer to this value. The first 10 tosses yielded 
only four heads (40%), followed by six heads on the following 10 tosses, meaning 
that 20 tosses had given him exactly 10 heads, 50%! After 100 tosses, the proportion 
was only 44%, rising to 50.2% after 200, until finally reaching 5,067 heads after 
10,000 tosses, giving 50.67%, which was close to the expected value of 50% heads. 


] 


MUCH EASIER 


Fortunately, today's computer simulations make it easy to repeat experiments of this type 
as many times as we wish. For example, to toss a coin and observe the number of heads or 
tails, and the large series of heads or tails, visit the following link: http://nlvm.usu.edu/es/nav/ 
frames_asid_305_g_3_t_5.html?from=topic_t_S.html, a virtual simulation created by Utah 
State University, which also has models for other situations. 


Jakob Bernoulli (1654-1705) spent a number of decades studying the problem 
and managed to provide a mathematical proof that the percentage of heads that 
would be obtained by tossing a coin indefinitely would tend without doubt towards 
50%. In the case of the die, his theorem shows that the proportion of fours tends 
towards 1/6. Bernoulli called his result the ‘golden theorem’, but current versions of 
the result are known as the ‘laws of large numbers’ (weak and strong laws). The term 
‘large numbers’ refers to the fact that the conclusions are true when the experiment 
is repeated indefinitely. However, it is clear we will not be able to repeat the process 
forever, not even with the help of a computer, no matter how powerful it may be. 
We can conduct the experiment a number of times, but in the end it will always 
be finite. What then, does the conclusion really tell us? Or better still, what use is a 
conclusion that we can never verify? 

At this point, the important mathematical concept of the limit comes into play, 
because what the laws of large numbers state is that the proportion of times a head 
comes up approaches the probability expected for the result as we increase the number 
of times the coin is tossed (or the die in the other example), 1/2 in the case of obtaining 
a head or 1/6 in the case of obtaining a four. 
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Stamp in honour of Jakob Bernoulli and his laws of large numbers. For his epitaph (right) 
Bernoulli chose the shape of the logarithmic spiral, in addition to the Latin emblem Eadem 
mutata resurgo (“Although changed, | shall arise the same”), which appears at the bottom. 

However, the spiral carved by the stone workers on his tomb was an Archimedean spiral 

(photo: Wladyslaw Sojka). 


The law of large numbers is a rule that is known by everybody without previous 
instruction, thanks to a certain natural instinct. It could be said that it forms part 
of our DNA (coexisting with the false law of small numbers, which we mentioned 
in Chapter 3). 

To discover the ‘golden theorem’ in its original form, Bernoulli imagined an 
urn with 5,000 identical balls, 3,000 of which were white and 2,000 of which were 
black. We then proceed as follows. We extract one ball, noting down its colour and 
returning it to the urn (to avoid altering its original composition); we then extract 
another ball and repeat the process many times (this procedure is called ‘extraction 
with replacement’). It is clear that the chance of removing a white ball is 3.in 5, or 
60%. The question asked by Bernoulli was how accurate this 60% is and what is the 
probability of obtaining this accuracy? 

At first glance, it appears to be a play on words that is difficult to understand. 
However if we persist, we will see the depth of the problem proposed by Bernoulli. 
Extracting 200 balls from the urn of white and black balls may yield 120 (60%), 
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100 (50%) or 125 (62.5%) white balls. However, what is the possibility that the 
percentage of white balls is between 55% and 65%? To be more specific, let us 
consider the probabilities of obtaining a percentage of white balls in the region 
of 60%, for example, between 59% and 61%, Let us add a final question, which is 
nonetheless just as important: do the possibilities increase if, instead of extracting 
200 balls, we extract 1 million? 

Questions of this nature, which were the purpose of Bernoulli's work, contained 
two types of error or uncertainty, which we must define. On the one hand, there is 
the deviation we are willing to allow from the real percentage of the balls (e.g. the 
percentage obtained is between 59% and 61%). On the other hand, we can never be 
100% sure our percentage will fall within these margins, although we can wish for 
this to be the case in many of the times we repeat the experiment. In other words, 
we can hope that this occurs with a precision of 95%, or rather, 95% of the times 
we repeat the experiment of extracting 200 balls (or 1 million balls!). It would seem 
that we're not bothered about the amount of time we need for so many repetitions! 

However, it turns out that it is impossible to specify both errors in advance. 
Bernoulli showed that if we repeat the experiment a sufficient number of times, or 


BERNOULLI'S THEOREM 


Let us assume that as the result of an experiment, we obtain a certain event, which we call 
A, whose probability of occurring is p. We repeat the experiment n times in a row and note 


how many of these give the result A. if the event A has appeared m times the quotient m/n 
represents the proportion of times A has appeared (relative frequency of the appearance of 
A). In absolute terms, the difference between the probability p and the relative frequency 
m/n measures the error made by using the relative frequency as an approximation of the real 
probability. 

Bernoulli showed that we can make the probability of this difference as small as we like by 
repeating the experiment a sufficient number of times, meaning that the probability of this 
difference tends to zero as the value of n increases. 

Mathematically speaking, this is expressed by saying that if € is an arbitrarily small positive 


number, the following holds: 
im. pefo 
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rather if we extract sufficient number of balls, it is possible to ensure the percentage 
of white balls is as close to 60% as we wish. The expression ‘is as close to’ can oscillate 
between 59% and 61%, or between 59.9999% and 60.0001% (i.e. as much as we 
require). Furthermore, the golden theorem provides a formula for the number of 
repetitions needed to obtain this accuracy. 

There are two different parts to Bernoulli’s theorem.The first, and perhaps most 
important, is that it is possible to obtain the required precision with a finite number 
of trials. The second gives us the number of trials required to obtain this precision. 
It is this second part that has a practical application in the real world. In any study 
carried out using surveys, it is possible to specify the error level we are willing to 
accept and determine the number of questionnaires that must be completed to ensure 
this error with a given precision. In the interests of brevity, let us imagine that we 
know the level of public support enjoyed by a public official in their city and we 
wish to know the precision with which, if we ask a certain number of citizens, the 
percentage obtained will deviate from a previously specified theoretical deviation. 
Using Bernoulli’: golden theorem, we can determine the number of people we 
need to ask. 

Bernoulli’s results were not as practical as he might have wished and his 
calculations included a large number of approximations that meant the number of 
questionnaires was excessively large. We should bear in mind that we are dealing 
with a result from the end of the 17th century, established using the tools available 
at the time. It is also influenced by the precision we specify, and Bernoulli always 
worked with what he called ‘moral certainty’, which meant a precision of 99.9%. 

The modern versions of the laws of large numbers have improved the estimation 
of the number of trials required. In spite of this, even with current results from 
probability theory, if we require an excessive precision, of 99.9999%, for example, 
and a small deviation with respect to the desired percentage, we would need to ask 
more than the number of people who live in a city! At any rate, it is clear that we 
can never work with an excessive error or exaggerated precision. 

Jakob Bernoulli was 50 years old when he died, without completing the 
manuscript that contains his theorem. His editors asked his brother Johann 
to complete it, but he refused, as did his nephew Nicholas, meaning that the 
work remained unpublished for eight years, finally appearing in 1713 under the 
title Ars Conjectandi.To this day, it still makes a fascinating read. Bernoulli used 
calculations to show how it was possible to obtain more in-depth knowledge 
of probability. 
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The cover of Ars Conjectandi, Jakob Bernoulli's unfinished work. 


The interest, depth and beauty of the problem that is set and solved by Bernoulli 
is of such a scale that any series of experiments repeated independently under the 
same conditions that may give rise to two possible results, such as the urn with 5,000 
balls, are referred to in modern-day probability theory as ‘Bernoulli trials’, 


Bad luck cannot last forever! Or can it? 


The laws of large numbers state that if we observe the occurrence of an event with 
a probability of p, the frequency will tend to p as we increase the number of ob- 
servations. Does this mean that if we come across a run of observations in which 
the frequency is far below the predicted value, another will have to appear at some 
point in time to bring us back to the expected frequency (and vice versa)? Or if we 
play a game in which the probability of winning is close to 1/2 and we lose a large 
number of games, is it best to continue playing, because surely a run in our favour 
must come up to ensure the number of games we have won is consistent with our 
probability of winning? In other words, bad luck cannot last forever! At some point 
in time we must start to win. 

Let us consider a game in a casino the only players of which are the bank and 
you. Another player is prowling around and watching the number of losses stack up. 
If the observer is versed in mathematics, it would seem reasonable that when they 
see that we have lost a large number of times, they decide to join the game, betting 
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in our favour, However, they decide not to bet. Is their behaviour logical? The laws 
of large numbers do not state that the results of events balance each other out in 
small runs, and nor do they state that this is the case for an extremely large number 
of repetitions. There is nothing that tells us that a run of losses will be followed by 
a run of wins, or vice versa. The laws hold when the frequency grows extremely 
close to the limit of the series of frequencies, but it does not have to hold for a 
finite number of observations. In other words, chance has no memory. This is easily 
verified. We ask 40 people to toss a coin 50 times, and write down the results. None 
of the 40 series will have a uniform distribution of heads, and nor will there be runs 
that balance out the frequencies. However, grouping together the 2,000 tosses, the 
frequency of heads (or tails) will be close to 1/2. 

In spite of the above, it is possible that we have seen someone play an unbiased 
slot machine until they win a prize. Does this contradict what we have said? The 
player has risked losing on various occasions because they hope their luck will 
change. It is even possible that they have watched the machine, observing that 
other players have lost before deciding to play. They then decide to play, thinking 
that the prize is about to ‘drop’. In these games, although the results of each game 
are random, the machines are legally required to return part of the money in 
prizes (for example, 70%) for a maximum of consecutive games, which is often a 
very large number (for example, 40,000). In fact, they are programmed not to give 
out the jackpots at the end of the run, but to distribute them to comply with the 
legal requirement. Every certain number of games, which is not always the same, 
the machines pay out the jackpot. Furthermore, since the results are computer- 
controlled, and hence not completely random, we can understand why there are 


players hunting each prize. 


A few words on statistics 


Let us now imagine we repeat an experiment to observe the appearance of an event 
whose probability is unknown. We repeat the experiment many times and observe 
the result of the event. Is the relative frequency of the event a good approximation 
of the probability we would like to find? Even if we believe we know the probability 
beforehand and repeat the experiment many times, the relative frequency of the 
result of the event can lead us to change that a priori probability. 

This approximation, in which we collect the information using experimental 
results and use them to derive conclusions about the probability of an event or other 
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characteristics of the experiment is the purpose of statistics. Probability involves 
deductive reasoning; statistics makes use of inductive or inferential reasoning. 
Probability begins by assuming a certain probabilistic model to be true and uses it 
to draw conclusions on the results produced by an experiment. Statistics assumes a 
model for the experiment whose properties are unknown and observes the results, 
using them to derive the properties of the experiment, such as the probability that 
a given event occurs, by means of an inferential process. 

In both processes, the results contain a certain error, which both statistics and 
probability are able to quantify, each with their rules, and in which the laws of large 
numbers or other results such as the central limit theorem, which we shall see below, 
often play a fundamental role. 

In the case of Bernoulli’s urn, if we know its composition, we can determine the 
probability that if we carry out n extractions with replacement, the proportion of 
white balls will be around 3/5, as we expect. We can also determine the number of 
extractions we must carry out to ensure the probability of the discrepancy between 
the relative frequency and 3/5 is within certain limits (as small as these may be), as 
close as we wish to 1.This is probabilistic approximation. 

However, it may also be the case that we only know there are black and white 
balls in the urn, and that we do not know how many or in what proportion. We 
begin to remove the balls, writing down their colour and returning them to the urn. 
Based on the relative frequency we observe for white and black balls, we can infer 
the proportion in the urn with a certain error or tolerance, More specifically, it is 
possible to determine the probability that the proportion is within certain limits. 
This is statistical approximation. 

This forms the basis of the surveys that are so common in our society. In order to 
determine the voting preferences for a candidate for mayor, we ask a large number of 
people (extract balls), note down their answer (colour) and infer, with a certain level of 
confidence (probability), the percentage of support among the electorate (percentage 
of white balls) for this candidate (urn), with a certain error level (discrepancy). It may 
seem surprising, but with a confidence level or probability as close to 1 as 0.955, it is 
possible to provide an estimate of the support for a candidate with an error below 3% 
(i.e. we can be wrong by a maximum of 3%) by asking just 1,000 people. If we only 
ask 900 people, the error increases to 3.3%. An approximate way of determining the 
error of a survey with the aforementioned level of confidence is by calculating se 
where n is the number of surveys. vn 
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The Gauss bell curve and normality 


Sir Francis Galton (1822-1911) said:““The larger the mob, the greater the apparent 
anarchy, the more perfect is its sway. It is the supreme law of unreason. Whenever a 
large sample of chaotic elements are taken in hand and marshalled in the order of 
their magnitude, an unsuspected and most beautiful form of regularity proves to 
have been latent all along.” 

Bernoulli was concerned with proving that the probability of the discrepancy 
between the relative frequency and the probability, regardless of how small we 
wished it to be, tends to one. However, he never concerned himself with evaluating 
the precision of this probability, which is given by the normal distribution and the 


central limit theorem. 


The normal curve 


Abraham de Moivre (1667—1754) proposed calculating the sum of the terms that 
appear in Newton’s binomial (a+ b)", where n is a large natural number. He linked 
the problem to the probability of an event, where n is the number of repetitions of 
the experiment that results in the event. He began by studying the case of (1+1)", 
associated with an experiment with just two results, each of which is equally probable. 
De Moivre wished to calculate the probability that for n repetitions of the experi- 
ment, the number of times the event in question (which has the same probability 
as its complement) appears is around n/2, He showed that a good approximation of 
this probability can be obtained using the integral 


which gives us what is now referred to as the normal distribution. 
The Gaussian or normal distribution is a probability distribution determined 
by the density function (the continuous equivalent of the probability function for 


discrete variables): 


where [L and 6? are the two parameters that define it, its mean and variance. Its 
graph, commonly referred to as the Gauss bell curve, is as follows: 
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Laplace would refine De Moivre’ result, proving that for an experiment in which 
an event E can be obtained with probability p or its complement (with probability 
1—p), when the experiment is repeated n times under identical conditions to give 
m occurrences of event E, the probability that the difference between m/n and p 
lies between specific values: 


_ty2p(1—p) and BS 2p(1— p) : 
vn vn 


which is approximately, 


2 


Es j oe” dxt i 


wa V2mn (t= p) 


This result is an early version of what is now known as the central limit theorem, 
and which is key to explaining the predictive power ofa large part of statistical tools. 
Beyond the formulae used to express it, the theorem states that given a sufficiently 
large sample, the properties of the sample will come to replicate the properties of 
the population from which it originates, meaning that the latter can be studied using 
probabilistic—statistical methods. 

De Moivre’s idea was to add up the terms of the rows of Pascal’s Triangle further 
down the triangle (when n was large). Today, the calculation can be carried out 
precisely and quickly using a calculator or a standard computer. However, note that 


in the 200th row, for example, there are terms with 59 figures (calculators are of 
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no use), and in the 18th century tools for calculation were not what they are now. 
The problem De Moivre set out to tackle was, as he himself recognised, extremely 
difficult. It was the limitations in terms of calculating that saw Bernoulli and De 
Moivre obliged to make use of approximations, which improved in line with the 
development of new techniques and calculating tools. 

Let us return again to Pascal's Triangle. Take the numbers of a row, such as the fifth 
one, and represent each number using a graph in which its height is proportional to 


its value: 
sy 
B+ 
6} 
af ; 
Ai 
2 


Doing the same thing with line 16 gives the following graph: 


14,000 
12,000 
10,000 
8,000 
6,000 
4,000. 
2,000 


As we make our way down Pascal’s Triangle, the following graphs grow closer to 
the normal curve. In other words, the sum of these coefficients can be approximated 
using the normal distribution, and this was De Moivre’s achievement. 
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The law of error 


Throughout history, the observation of the firmament, the planets, stars and comets, 
has been one of the most widely studied areas of science, since understanding and 
knowledge of the Solar System (and of other systems beyond our own) are regarded 
as necessary to understanding our own planet and its life. 

In the 18th century, scientists who were working in physics, mathematics and 
celestial mechanics commonly encountered the problem of discrepancies in their 
observations. Indeed, it would appear that as far back as the 16th century, the practice 
of taking various measurements of an unknown quantity under the same conditions 
(or as similar as possible) in order to obtain a reliable value was widespread. As a 
result, there were various measurements and the problem lay in deciding the correct 
value of the magnitude. One possibility was to take the arithmetic mean as the ‘best’ 
value. However alongside this approximation, in the 18th century, there was another 
school of thought that assigned an observation carried out with ‘special care’ as the 
true value. The controversy ensued until the appearance of the normal distribution 
as a suitable model for errors. 

The normal distribution appeared in full force when Carl Friedrich Gauss (1777— 
1855) took it as the correct distribution for the connection errors using the least 
square method, developed as a method for adjusting astronomical measurements. The 
basis of the method is as follows: a parameter for which we can take measurements 
(subject to error) a repeated number of times depends on a series of magnitudes 
that are not subject to error. 

We seek the best function for representing the parameter, in the sense that the 
sum of the squares of the discrepancies between this function and our observations 
of the parameter is as small as possible. (The squares are taken to provide the same 
values for discrepancies above and below.) 

Assuming the errors followed the normal law, Gauss showed that the function 
that minimised the errors, applying the method of least squares, is the function that 
makes the observations more believable or plausible in probabilistic terms, or rather 
that which assigns maximum probability to the observations. The normal distribution 
came to be recognised as an important distribution in the study of errors, and the 
least squares method was adopted in astronomy and geodesy, Furthermore, based 
on astronomical observations, it was possible to verify the consistency between 
observations and the normal law. 


125 


THE ADVANTAGES OF BEING ‘NORMAL’ 


The hypothesis of elementary errors 


One way of supporting the normal law as the law of error came from the elemen- 
tary theory of errors. It consists of assuming that each error made when making an 
observation or measuring a parameter is simply the sum of a large number of errors 
caused by various independent sources, each of which is small in comparison to the 
sum. This principle was used to prove that the total error distribution conforms to 
the normal law. This is what is known as the central limit theorem. 

The principle of elementary errors became widely accepted throughout the 19th 
century. Numerous scientists made attempts to prove that under this assumption, 
the probability law that governs the total error is the normal law, including the 
German Friedrich Wilhelm Bessel (1784-1846), who used the least squares method 
in multiple applications and provided empirical evidence to support the normal law 
as the law of error, as proposed by Gauss. Bessel introduced the following condition: 


“An observational error is the sum of a large number of elementary errors, due 
to different and mutually independent causes. No elementary error significantly 
exceeds the others, and positive and negative errors of the same magnitude appear 
with the same frequency. Furthermore, the laws governing the elementary errors 
are not necessarily the same.” 


Bessel’s hypothesis contains two important aspects. None of the errors is of 
special importance, and the distribution of positive and negative errors is symmetric. 

When a teacher has to assign a grade to the work of their students, they assign 
a numeric mark between 0 and 10. Are they really able to explain the difference 
between an 8.8 and an 8.9? Let us dare to suggest they cannot. When they carry out 
the marking, they are subject to circumstances under which errors may appear. They 
read the fifth work and the fortieth work differently as they start to grow tired; the 
way in which students express themselves differs; and the deadline for submitting 
the marks can mean the time for marking each exam varies considerably. There are 
various factors that can influence the marking, both for and against, or various 
sources of elementary error! We can assume that the numerical marking is subject 
to certain variability due to the accumulation of causes, without meaning that it is 
unfair. Perhaps this is the basis of a certain legend that says that teachers mark their 
students’ work in line with the normal law, or adjust their marks to conform to 
it. This is just one example that shows how the problem of error is a current and 


common one. 
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The central limit theorem 


The central limit theorem (CLT) states that if a measurement is the result of the 
sum of a large number of factors (subject to error) that act independently, and none 
of which dominate the sum total, the law of probability for the sum total is dis- 
tributed according to the normal law, independently of the law of probability that 
determines the factors. 

Let us analyse a specific situation to understand just what this means. In terms 
of the coin we have tossed so many times, the CLT states that the proportion of 
heads obtained tends towards 50% as we increase the number of tosses, and the 
probability that the difference is between two specific limits can be determined 
using the normal distribution. For example, if we toss the coin 100 times, the 
probability that the proportion of heads is between 48% and 52% is 0.3108; if we 
toss it 400 times, it will be 0.5762; and if we toss it 5,000 times, this increases to 
0.9958. We have a high degree of certainty that the percentage of heads will be 
between the values we have mentioned. These probabilities have been determined 
using the normal distribution. 

Another scenario arises when a company that manufactures energy bars states 
on the packet that each weighs 60 grams. On some occasions, the real weight will 
exceed that which has been declared, whereas on others, it will be below. However 
the weights will conform to the normal distribution and, reasonably, the weight 
declared on the packet will be the average weight. The normal distribution allows 
us to determine the probability that the weight of a bar is within specific limits: to 
do so we need to check the probabilistic normality. 

The two examples above also allow us to point out two aspects that gave rise 
to the search for the mathematical explanations from which the CLT is derived: 
determining the probability that the average of the errors of the observations 
(historically in astronomy and geodesy) lay within certain limits; and the development 
and acceptance of the elementary error hypothesis as the basic principle for assuming 
the distribution of errors of a measurement process follows the normal law. 

A mathematical formulation of a simple version of the CLT states that: “If X,, 
X,,... are independent random variables (measurements) with the same distribution 
and a finite average common value |, and whose squares also have a finite average 
value, the probability that the normalised sum X,+X,+...+X, is between the values 
a and b is approximated by the normal curve as n grows indefinitely”. In precise 


mathematical terms, it states: 
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(X, +X, +...+X,)—np <b 


d Toe 
lim P}as ———_—_*——— = area dx, 
in? = Jaye liet 


where 0?= E(X*)—? is the variance. 

The CLT can be summarised by saying that it verifies probabilistic normality 
or that the variables satisfy probabilistic normality. It is important to point out that 
the CLT or probabilistic normality is verified independently of the distribution of 
the variables. In a certain sense, what the theorem says is that we cannot predict 
the individual behaviour of a variable or individual, but we can predict the average 
behaviour of a population. 

Both at the time and to this day, the CLT provides a more in-depth knowledge 
of the behaviour of observations and their relationship with the physical world. The 
same ideas have been applied to the study of both physical and social properties 
of human populations and, to the surprise of many, it has been shown that, in 
many cases, the behaviour of the properties of human populations conforms to that 
predicted by the central limit theorem. 


The Galton board 


A board devised by Galton is a device for experimentally checking probabilistic 
normality and the central limit theorem. It is a vertical board with a series of nails 
arranged in the shape of Pascal's Triangle. A nail is placed below a funnel, into which 
we insert identical balls, and beneath it are another two, one to the right and one 
to the left. Beneath them are other nails, located to the left and the right of those 
in the row above, and so on. 

When a ball passes through the successive levels, it knocks against the nails, which 
send it to the right or left as it falls. Finally, the balls are collected in a series of 
compartments located below each of the spaces under the last row of nails. The 
collected balls conform to the normal curve. 

The explanation of the process is as follows. At the start, a ball represents the 
error we make when measuring a property. This error of measurement (or random 
variable) can only take two values (left and right, above or below), which have equal 
probabilities. The value corresponds to the deviation caused by the nail located 
below the funnel when the ball hits it. When it drops down to the row below, a 
new error occurs, which is added to the previous one. If we imagine that the row- 
by-row advance of each ball represents the accumulation of errors, the end result, 
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which consists of observing the place in which the ball comes to rest, represents the 
sum of as many errors as there are rows in the device. If we superimpose Pascal’s 
Triangle onto this device, the numbers of the triangle indicate the number of paths 


that each nail leads to. 


A Galton board. 
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Chapter 7 


Probability in Society 


Life tables 


Life expectancy and mortality tables (LMT) are a statistical tool that makes it possible 
to study the incidence of mortality in various populations over a given period of 
time. One of their most common applications is life insurance. The first LMT tables 
were created by John Graunt who, in the second half of the 17th century, together 
with William Petty, prepared the first population statistics, which would become the 


forerunners of modern demographics. 


JOHN GRAUNT (1620-1674) 


Graunt was the owner of a haberdashery that 
provided him with enough money to allow him to 
dedicate himself to interests as far removed from 
his business as studying the population of London, 
The Bills of Mortality were 16th-century records of 
christenings and burials (with the age and apparent 
cause of death) for the various parishes of London to 
allow the authorities to keep track of the epidemics 
that devastated the city. Weekly records had been 
kept since 1603, one of the worst years of the 
plague, Graunt studied the information, organising it 


into what is now referred to as life tables, analysing 
the various social aspects of the population, such as 
how the number of burials exceeded the number of christenings in the city, in contrast to the 
countryside. This marked the beginning of demographic studies. In 1662 Graunt published his 
1662 Bills of Mortality (its full title is Natural and Political Observations Mentioned in a Following 
Index, and Made Upon the Bills of Mortality), which were received with considerable interest 
among English society and gained him entry to the Royal Society. 
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After studying mortality data, Graunt took a selection of 100 people and showed 
how this initial group diminished over the years (due to the deaths of its members): 


At time of Survival 
is Birth 100 
At end of sixth year 64 
At end of year 16 40 | 
— 
At end of year 26 25 
At end of year 36 16 


At end of year 46 10 


At end of year 56 6 
At end of year 66 3 
= At end of year 76 An 1 
At end of year 86 0 


He observed that in 17th-century London, infant mortality during the first six 
years of life was 36%, and that only 1% of people lived beyond 76. 

The only major differences between modern LMT tables and the one devised 
by Graunt is that the date of birth is used instead of the date of conception, the 
size of the sample is different, and they include more information. The idea can be 
applied to various situations or populations, which do not necessarily need to be 
human.The main purpose of the tables is to study mortality or, in positive terms, the 
average remaining time of life or life expectancy of the individuals of a population. 

LMTs for human populations are prepared both at national and at smaller levels, 
and are grouped by geographic, ethnographic and administrative criteria. The method 
used for their preparation is set out in the protocols of the Human Mortality Database 
(HMD). However, they may be adjusted as required to suit the population being 
studied. 

To start with, the fundamental characteristic of an LMT was life expectancy, 
but they now include various biometric functions that allow the analysis of various 
characteristics of the population related to the life expectancy of the individuals. 
The basic biometric functions that are often included are described below. All of 
these properties refer to each of the ages included in the tables. 
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Life expectancy, EX(x) 


Life expectancy indicates the average number of years, beyond the current age x 
an individual has left to live if the standards of living among the population remain 
constant. EX is an average based on the experience of a hypothetical group of people 
from the same population. The technology now available makes it possible to update 
the data for calculating this average on an annual basis, based on real mortality data, 
hence increasing the reliability of the information. 


Probability of death, q(x) 


In spite of the name, life tables often show the number of deaths expected per 1,000 
members of the population, or rather, the probability multiplied by 1,000.The value 
is referred to as the ‘risk of death’ in addition to a probability. 


Theoretical deaths, d(x) 


The number of theoretical deaths corresponding to each of the ages in the table. 


Survival, L(x) 


Denotes the number of individuals in the population who will live to a specific age. 


Average years lived last year for those who live to age x, m(x) 


This is the average time lived after reaching age x for individuals in the population 
who die at this age. 


Stationary population at age x, SP(x) 


This is the total number of years lived by the individuals of the population who live 
to the age x. 
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SOME FORMULAE 


Among this tangled web of numbers, there are relationships that allow us to understand how 
calculations are carried out. Some of these are obvious, such as that which states that the 
number of survivors at age x+1 is equal to the number of survivors at the previous age minus 
the number of theoretical deaths at that age: 


Lix+ 1)=L(x)—d(y), 


Similarly, it is obvious that the probability of death expected at an age x is equal to the propor- 
tion of theoretical deaths and survivors at that age: 


(x) =d(x)/L(x).. 


Life expectancy represents the average number of years left to live by an individual of age x 
who belongs to the initial group. Its value is given by the quotient of the total time (in years) 
left for the individuals in the group upon reaching age x until its complete extinction and the 


number of survivors at the same age x. Hence: 
EX(x)=Z ., LO/LQ). 


Given that each individual who survives to age x contributes one year to the total number of 
years that make up the stationary population and, on average, each of those who die at age 
x contribute m(x) years, the stationary population is estimated using the expression: 


SP(x)=L(x-+ 1)+mi(x): d(x). 


Life tables in Spain 


By means of example, the 1991 and 2008 LMT tables are provided for the Span- 
ish population, without distinguishing between sexes and for a sample of 100,000 
people. They contain only the biometric functions for the first years, decades and 
in the final years. In the tables, the risk of death expresses the number of deaths per 
1,000 people, and the survival rate denotes the individuals who reach the corre- 
sponding age per 100,000 people. Furthermore, the average number of years lived 
in the last year of life for an age group of 100 or more corresponds to the average 
of the years remaining having reached 100. Let us compare the statistics for 2008 


with those for 1991. 
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19.183557 


8.982544 


791.977173 


88,168.474144 


15.970797 


9.037833 


14,674447 


46,184531 


1,236,837873 


2,996,543216 


84,285.144687 
64,881.966822 


Year Average of 
2008 Life Risk of Theoretical Survivors years lived Stationary 
Both | Expectancy Death Deaths in last year Population 
Sexes of life 
EX(x) a(x) 
81,241026 3.465529 346,552939 100,000,000000 0.137237 
79.547457 [onas453 | 14490482 | 99,623,110434 | 0.513622 99,616.062577 
78.558955 [ orzsis6 | 12.865094 99,608.619953 0.562170 99,602.987224 
10 71.617435 0.074764 7.441380 99,531.332011 0.422968 99,527.038100 
20 61.757984 0.382588 37.997440 99,316.952351 0.517152 99,298,605348 
30 51,980293 0.447539 44.274141 98,928,107308 0.494027 98,905.705789 
40 42,290481 1.154064 113.408522 98,268,813976 0.493815 98,211.408302 
50 32,971282 3.121249 301.106291 96,469 799099 0.490568 96,316.406013 
60 24.170848 6.846135 632,746012 92,423.825316 0.500213 92,107,586949 
64 20,821734 8.686309 779,849021 89,779,097834 0.508596 89,395.876772 
19,999726 9.334626 830.774669 88,999.248813 0.490080 88,575,620021 


0.503076 


519073 


0.502567 


87,774.921405 
83,690.315864 


63,391,386179 


4.375735 


2.004780 


7.081728 
76.639988 
75.690400 


74,720687 


67,.832377 


21,561757 


17,590282 


70 13,907275 
80 7.676978 
100 2.706079 


1,000,000000 


7.203662 
0.661279 
402539 


0.298234 


0.221059 


8.564080 


22,260000 
66.739361 | 3547.145111 


1,000.000000 | 822.631577 


148.716217 


13.629036 


d(x) 


65.651531 


39,937448 


29.577085 


21.889151 


3,998.115689 


2,019,430719 


Theoretical 


759.484760 
1,146,193043 
1,720.168525 


26,884,194463 


2,019,430719 


Survivors 


L(x) 


720.366150 100,000,000000 


99,279,633850 
99,213.982319 
99,174.044870 


99,019.491510 


88,682.590420 


84,099.350342 


77,276.213859 


0.492627 


2.004780 


Average of 
years lived 
in last year 
of life 


0.143286 
0.456575 
0.480805 


0.471074 


0.457712 


0.506036 


0.493264 


0.50473 


53,149.221627 


0.504951 


822.631577 


2.706079 


24,855.657500 


4,048,513560 


Stationary 
Population 


99,382.85 
99,243.96 


99,193.25 


99,158.40 


99,007.62 


88,307.43 


83,518.53 


76,424.27 


51,393.21 


2,226.11 
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Comparing the tables for 1991 and 2008, we can see there has been a considerable 
improvement in the rate of infant mortality, which has decreased by more than 50% 
in this period (7.2 deaths per thousand born in 1991, compared with 3.46 in 2008). 
The situation is similar for early ages. This has considerable impact on the calculation 
of the average for life expectancy, which has increased from 77.08 years at birth in 
1991 to 81.24 years in 2008, an increase of more than four years! However, if we 
observe the life expectancy upon reaching 65, the official Spanish retirement age at 
the time, it becomes apparent that at this point, life expectancy has increased from 
17.59 years in 1991 to almost 20 in 2008, or rather an increase of almost 2.5 years, 
considerably less than the increase in life expectancy at birth. 

A more detailed analysis of each of the age groups is of fundamental 
importance in demographic studies and constitutes an essential tool to help 
organise societies, making it possible to estimate everything from pension costs 
to possible health expenditure. 

Special tables, such as LMT by sex or ‘disability-free’ populations, according to 
official terminology, are drawn up to study these or other properties of interest. 
Similarly, life expectancy has been incorporated into ‘well being’ and ‘human 
development’ indexes measuring the circumstances of countries in the international 
context. However, considering the use and application of LMTs in greater depth 
is beyond the scope of this section, and we leave it to the reader to consult further 


on these issues, 


Insurance 


One of the most lucrative businesses in our society is selling insurance. We are of- 
fered a range of insurance products: life, car, health, home, etc., and behind all of 
these lies a series of mathematical calculations that take into account both our risk 
(the probability that the companies must pay out) and the period for which we can 
make claims for the insured concept. 

One of the most popular products is life insurance, whereby a person insures 
their life, meaning that when they die, their family receives a certain quantity of 
money to help cope with their absence. Determining the instalment they must pay 
and the sum received in the event of their death is based on a series of statistical— 
mathematical calculations that attempt to predict the remaining life expectancy of 
the policyholder. Clearly, it is impossible to predict the life expectancy of individuals, 
but in large populations it is possible to observe a series of regularities that allow 
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the use of probabilistic and statistical tools to estimate the average life expectancy 
of people in a given group. 

The existence of insurance dates back longer than we might imagine, and there 
are insurance documents related to shipwrecks and goods sent on boats from ancient 
Greece and Rome. In the Middle Ages, pilgrims travelling to the Holy Land were 
able to take out ‘rescue insurance’, whereby paying an instalment meant that if they 
were kidnapped on their journey and requested rescue, this would be paid by the 
insurer, However, there was no insurance equivalent the central element of which 
was the death of the insured party. Not only did such insurance not exist, but it was 
regarded as sacrilegious, since life and death were governed by the will of God and 
could not thus form an object of study. Insuring life (or death) was to go against 
the designs of the Creator. 

The Renaissance saw the relaxation of morals and contributed to the rise 
of business activities and, consequently, life insurance. However, countries with 
stricter moral rules continued to disallow life insurance, and where persuasion was 
not enough, outright prohibition was used, as was the case in Spain in 1570 and 
Holland in 1598. Just a century later, the calculation of the probability of death and 
life expectancy were firmly entrenched and nobody objected to them. The more 
reluctant limited themselves to commenting that the statistical regularities observed 
in human populations were also a sign of the divine order. 

The calculation of an insurance instalment or premium follows a similar process 
to the calculation for a game of chance: the companies determine the life expectancy 
taking into account all the possible risks they are insuring, multiplying the probability 
of the risk (estimated based on the data for recent years) by its average cost, and 
adding together all the possible risks; to this quantity they add overheads and the 
profit they wish to obtain. In terms of games, the players are the insured, who play 
against the bank, or rather the insurance company. However, in this case they have 
no interest in obtaining the prize, which represents an accident (or death) followed 
by an insurance payment. 

For example, if we classify the risks of car insurance into three categories (see 
the following table), and the information handled by the company is as appears in 
the table, the mathematical expectation (ME) of the cost of the accidents will be: 


ME=0.003-7,000 + 0.05 «3,000 + 0.3-500=21 + 150+ 150= £321. 
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Rceitan Probability 


Consequently the cost of the insurance policy P will be: 
P=ME + overheads + required profit. 


Insurance can be regarded as an unfavourable ‘game’ for the insured party most of 
the time (provided they do not have an accident) but which covers, at a small cost, 
the risk of having to pay a considerable sum or purchase a new car in the event of 
an accident. When we take out insurance, by paying a modest sum every year (or as 
a one-off in the case of travel insurance, for example), we hope to have a prize that 
means that if something happens to us by accident, which is unlikely but possible, 
we can deal with it. In general, insurance can be regarded as a lottery, a game on 
which it is worthwhile betting. 


Retirement age and pensions 


One of the greatest achievements of modern societies with welfare systems has 
been caring for the elderly, who are provided with a pension when they stop 
working and reach a certain age limit. The debate about whether workers should 
extend their working life, or rather if they should work until they are older in 
order to obtain the rights to a pension, is a recurring theme in most countries. 
Regardless of the seriousness and rigour with which they are conducted, discus- 
sions of this nature are all based on life expectancy. For how many more years 
will a person receive their pension, assuming they do not die before reaching 
retirement age? 

As is the case with insurance, devising suitable LMTs and the calculation of 
life expectancy are essential in dealing with these issues. However, we do not only 
need suitable tables, but must also analyse them correctly, since, generally speaking, 
it is not enough just to consider life expectancy at birth, but is also necessary to 
analyse life expectancy for the age in which we are interested, or other biometric 


functions. 
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Other applications 


There are obviously many other applications of LMTs both in the context of human 
populations and beyond. For example, in engineering and mechanics, the duration of 
a part or machine that may suffer from a fault and needs to be replaced by another 
is nothing more than the calculation of the useful life for that part. Without a doubt, 
correct LMTs are crucial to many industrial processes. 

Although the previous example may appear to have a limited effect on our daily 
routine, this could not be further from the truth. Imagine that the machine that 
could suffer from the fault is our television or washing machine, or perhaps the light 
bulbs we use. What then, are we talking about? None other than the average interval 
between having to repair our washing machine or television. Pausing to reflect for a 
moment leads us to the observation that the manufacturers know the LMTs for the 
parts but we do not. What does this mean? Unfortunately it means that manufacturers 
specify the guarantees of products in line with these life expectancies, which we do 
not always consider. It is possible that some pieces of equipment, such as televisions, 
suffer fewer faults, but others, such as washing machines, which are subject to an 
intensive process of wear, suffer from more faults. Who hasn’t had the feeling that 
the fault has occurred just when the guarantee period has come to an end? 

This situation is not due to chance, but to the study carried out by the companies 
on the life expectancy of their products. It is obvious that the guarantee will not 
be longer than the forecasted duration of the equipment, but that it will be shorter. 
Perhaps this gives us a better understanding of what a guarantee means. Fortunately, 
probability and statistics can be understood by anyone who wishes to study them, 
which allows the authorities, based on the information provided by experts, to 
prevent abusive practices and set a minimum duration for the guarantee of products, 
with which manufacturers must comply. Users’ associations also play an important 


role in the process. 


Probability and statistics in medical practice 


The results of clinical analyses allow doctors to consult a series of data, which, in 
line with probabilistic and statistical procedures, are provided in a comprehensible 
format to facilitate taking the appropriate measures. The final decision corresponds 
to the doctor; the data, the treatment carried out with it and the format in which 


it is supplied represent a support tool for their decision. However, some of this 
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information may determine the decisions taken by the doctor or impose regulations 
in terms of their behaviour. For example, we know that certain ‘warnings’, together 
with measurements, indicate that something is ‘outside its normal limits’. What does 
this mean and how is it decided? 

In medicine, the term normal limits refers to the limits of a certain property to 
which the majority of the population to which we belong conform, and they are 
determined with the help of the central limit theorem and the normal distribution. 
The procedure involves determining the values between which the average is to 
be found with a certain probability, the problem studied by Laplace. The values 
between which the average should lie are currently referred to in statistics as 
confidence intervals. 

A similar situation occurs when a paediatrician states that the percentiles into 
which a child falls in terms of their weight and height are 85% and 95%. The 
confidence intervals for the different ages used by paediatricians refer to the weights 
and heights of the population of boys and girls that make up the reference population, 
If a child is in the 80th percentile for their height and the 75th for their weight, 
this means that 80% of children of the same age will be as tall as the child and 
similarly for their weight. It is important to note the fundamental importance of a 
good reference population. If the population that has been used to determine the 
normality of heights and weights differs considerably to the reality, the results will 
be invalid; for this reason there are different bands of reference for boys and girls, 
since the reference population is defined by gender. 


Probability and DNA 


Since the second half of the 1980s, the use of deoxyribonucleic acid (DNA) profiles 
has become common in legal cases to prove paternity, the family relationship between 
different people, and the innocence or guilt of alleged criminals. 

For paternity testing, the idea is to compare the DNA of the children with the 
alleged parents, and in the case of criminals the idea is to compare DNA obtained 
from samples taken at the crime scene. These tests have undergone improvements 
and today constitute a fundamental part of many criminal trials, due to the large 
variability between the DNA profiles of different people, even those who belong to 
the same ethnic group. These techniques have become known among the general 
public for various reasons, such as the notoriety of certain cases in which tests 
of this nature have been used, or television series such as CSI. What is less well 
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known, however, is that behind the preparation of these DNA profiles lie elaborate 
probabilistic tools that have given rise to forensic statistics. 

One of the first trials in which DNA testing was used took place in England in 
1987, when a young man aged 17 was accused of raping and murdering two girls, 
one in 1983 and the other in 1987. Certain circumstantial evidence and the history 
of the accused seemed to suggest he was guilty, with little room for doubt. He 
requested a blood test analysis and his DNA was compared with the sperm samples 
found on the two girls, The conclusion was striking: the two girls had been raped 
by the same person, but not the accused. 

Legal trials that attempt to determine those responsible for a crime measure a 
property that makes it possible to establish the potential relationship between the 
accused and the crime. The idea itself is simple, but the problem is putting it into 
practice. It is a matter of establishing a property that makes it possible to accurately 
identify someone. Fingerprints are one such example and are used where possible 
since they offer a good way to identify people. However, no ‘professional’ criminal 
will leave fingerprints and, furthermore, these can be easily destroyed or suffer from 
deterioration. 

DNA offers an improvement over fingerprints because it determines a person 
almost unequivocally (except in the case of homozygous twins who share DNA 
but, surprisingly, not fingerprints), in addition to various other advantages. The first 
is that it is present in the nucleus of all cells, such that it is sufficient to possess a 
minimum quantity of organic material in order to be able to measure it (a single 
hair, a drop of blood, a fragment of skin). The second is that it is highly stable and 
does not change over a considerable period of time. The only drawbacks are the 
technological limitations for accurately measuring it, since only certain genetic 
markers can be analysed, and while DNA is practically unique, there are people who 
share a limited number of markers. 

Here we will not discuss the way in which DNA is analysed, but we will describe 
how the calculation of probabilities contributes to making decisions on the crime in 
question. To visualise this, we shall consider a simplified situation, which is unrealistic 
but useful for our purposes. A crime has been committed and the police collect 
DNA samples at the scene and arrest a suspect, from whom they also take a DNA 
sample. Assume that each DNA sample provides a single piece of data. Regardless of 
whether other possibilities arise during the trial, for us there are only two alternatives: 
the suspect is guilty (G) or innocent (J), and the decision will be based on the DNA 
samples. The aim is to distinguish between the following two hypotheses: 
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1.The samples come from the same individual (G). 
2.The samples come from different individuals (J). 


Let us use Ev (evidence) to denote the coincidence of DNA samples collected 
at the scene of the crime and from the suspect, and S to denote the remaining situ- 
ations. 

Here we must make use of Bayes’ theorem, which states that given two events 
A and B denoted by prob(A| B) the conditional probability of event A given event 
B, provided that prob(B) #0, ensures: 


prob(A | B) = (prob(B| A) - prob(A)) / prob(B). 


Using the notation for conditional probability and Bayes’ theorem, the propor- 
tion of guilty to innocent is given by: 


This expression implies that the two fundamental probabilities that must be 
determined are the probability of the evidence Ev if the suspect is guilty and if they 
are innocent. It is not enough just to consider the probability of the evidence when 
the suspect is innocent and conclude that if the value of this probability is small, 
it offers strong proof of their guilt. The probability of the evidence if the suspect 
is guilty must also be taken into consideration. It should be noted that these two 
probabilities are not complementary, and both may be extremely small. 

Subtleties of this nature are examined by lawyers and prosecutors, and the 
misinterpretation of subtle probabilistic concepts such as these can result in the 
sentencing of an innocent person or a guilty person being set free. The techniques 
used by lawyers have little to do with probability and are instead related to an 
interpretation of the results. The previous expression also shows the dependence 
of the proportion of guilt to innocence based on other types of tests (S), which 
must never be ignored. 

The determination of these probabilities of the evidence Ev as to whether the 
suspect is guilty or innocent is one of the controversial aspects of legal cases that use 
DNA testing. Procedures are used that make it possible to determine this evidence in 
the most objective manner possible, such as considering reference populations upon 
which the calculations are carried out by making use of databases based on ethnicity 


142 


PROBABILITY IN SOCIETY 


or gender. However, there are many cases in which the problem remains unresolved, 
since the method for generating the databases and the calculations carried out using 
them are not always transparent. Additionally, in DNA testing it is customary to 
measure multiple markers, meaning that the probability all of them have appeared 
can be measured by multiplying the individual probabilities if their independence 
is accepted, another controversial aspect of their use in trials. 

DNA testing is also important when it comes to identifying people in military 
conflicts, natural disasters or accidents. 
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Epilogue 
Chance: Tamed at Last? 


This leads us to the end of our account of the taming of chance, from going from a 
supernatural explanation of things that are not understood to the taming of complex 
events through their transformation into parameters that allow us to control them 
and better understand the underlying trends. As the Indian statistician C.R. Rao 
once remarked, “Chance deals with the order present in disorder, while chaos deals 
with the disorder present in order.” 

It has been a long and complex journey for humanity, which has not yet reached 
its end, since hardly anything related to probability is intuitive. We come up against a 
logic that differs from ‘yes or no’, ‘all or none’, which due to genetic factors or our 
cultural milieu, we understand much better. It is accepted and understood without 
a problem that when we flick a switch, a light comes on, and when we flick it again, 
it goes off. However, it is less understandable if we do this a few times and it only 
comes on sometimes. We must evaluate this uncertainty. 

Chance consists of uncertainty, of not knowing what is going to happen, of the 
insecurity that is held in such disregard in our ‘developed’ societies, in which we 
like to have everything under control. While we may have entered the historical 
period in which the greatest number of social variables have been brought under 
control, insecurity has come to irritate us more (the climate, contagious diseases, 
accidents, etc.), which in a certain sense are part of life. It is true that it is hard to live 
when certain phenomena occur in a completely unpredictable manner. However, 
at the other end of the spectrum, life would be very dull indeed if everything 
was completely deterministic and predictable. Fortunately, the future of life in an 
organised society is a unique mixture of both, which means that, as a well-known 
statistician used to say, “Life will be complicated but not uninteresting.” 
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Taking Chances 


The rules of probability 


The taming of chance by reducing it to numbers is one of the 
most formidable achievements of the human intellect. Where 
there was once only a choice between the extremes of absolute 
certainty and radical doubt, a landscape of infinite gradations 
between the two has opened up before us. Today, the field 

of probability theory constitutes one of the most fascinating 


branches of modern mathematics. 


